Admin application

Admin application
 
Aug-1, 1:45pm EDT

As of August 1st, 2024 we have removed the link to the Legacy Incident Editor from the admin portal to ensure that users maintain a unified experience and are able to take advantage of all of our latest improvements to the incident creation process.

Status pages Admin application
 
Jul-30, 9:30am EDT

StatusCast engineers were alerted earlier that some users were experiencing sporadic issues attempting to connect to the status page and admin portal. Our hosting provider, Microsoft Azure, has alerted us via their status page that they are experiencing some network issues globally. We will provide an update as soon as more information is available. 

 
Jul-30, 10:44am EDT

Access to status pages has remained stable and Azure has updated their status indicating failover processes have been engaged to improve their service availability. StatusCast's engineers will continue to watch this closely and will post additional updates as necessary.  

 
Jul-30, 4:47pm EDT

StatusCast's application has continued to remain stable. Our engineers will continue to watch the system closely as Microsoft has not fully closed out the event on their side. For more specific details on Azure's issue please refer to their status page. We will provide additional updates as necessary. 

 
Jul-30, 6:00pm EDT

Microsoft has closed the issue on their side and StatusCast's platform continues to operate as expected. Once Microsoft has published more details on this we will provide here in the form of an RCA.

 
Aug-1, 1:07pm EDT
FROM MICROSOFT:
Mitigation Statement - Azure Front Door Issues accessing a subset of Microsoft services
Tracking ID: KTY1-HW8

What happened?

Between approximately at 11:45 UTC and 19:43 UTC on 30 July 2024, a subset of customers may have experienced issues connecting to a subset of Microsoft services globally. Impacted services included Azure App Services, Application Insights, Azure IoT Central, Azure Log Search Alerts, Azure Policy, as well as the Azure portal itself and a subset of Microsoft 365 and Microsoft Purview services.

What do we know so far?

An unexpected usage spike resulted in Azure Front Door (AFD) and Azure Content Delivery Network (CDN) components performing below acceptable thresholds, leading to intermittent errors, timeout, and latency spikes. While the initial trigger event was a Distributed Denial-of-Service (DDoS) attack, which activated our DDoS protection mechanisms, initial investigations suggest that an error in the implementation of our defenses amplified the impact of the attack rather than mitigating it.

How did we respond?

Customer impact began at 11:45 UTC and we started investigating. Once the nature of the usage spike was understood, we implemented networking configuration changes to support our DDoS protection efforts, and performed failovers to alternate networking paths to provide relief. Our initial network configuration changes successfully mitigated majority of the impact by 14:10 UTC. Some customers reported less than 100% availability, which we began mitigating at around 18:00 UTC. We proceeded with an updated mitigation approach, first rolling this out across regions in Asia Pacific and Europe. After validating that this revised approach successfully eliminated the side effect impacts of the initial mitigation, we rolled it out to regions in the Americas. Failure rates returned to pre-incident levels by 19:43 UTC - after monitoring traffic and services to ensure that the issue was fully mitigated, we declared the incident mitigated at 20:48 UTC. Some downstream services took longer to recover, depending on how they were configured to use AFD and/or CDN.

What happens next?

Our team will be completing an internal retrospective to understand the incident in more detail. We will publish a Preliminary Post Incident Review (PIR) within approximately 72 hours, to share more details on what happened and how we responded. After our internal retrospective is completed, generally within 14 days, we will publish a Final Post Incident Review with any additional details and learnings. To get notified when that happens, and/or to stay informed about future Azure service issues, make sure that you configure and maintain Azure Service Health alerts – these can trigger emails, SMS, push notifications, webhooks, and more: https://aka.ms/ash-alerts. For more information on Post Incident Reviews, refer to https://aka.ms/AzurePIRs. Finally, for broader guidance on preparing for cloud incidents, refer to https://aka.ms/incidentreadiness.
June 07, 9:00am EDT
Admin application Support Services
 
Jun-7, 9:00am EDT

Following the acquisition by 4me, we have been preparing to integrate StatusCast’s support operations into 4me’s infrastructure. By combining our support operations with 4me’s, our customers will have the advantage of 24x7 support and the number of available representatives will almost quadruple.


To take the first step forward in officially merging these teams we will be removing the live chat option available in the admin portal starting on Friday June 7th, 2024. You can continue to be  to communicate with support through email(support@statuscast.com) or by accessing our current support portal.


In the coming months, you will notice several changes in how StatusCast's support is provided. This will involve the introduction of a new and improved support portal for tracking any change requests or issues that you have logged with support. Additionally, more tools and auditing information will be made available in the application, and there will be an increase in the help and knowledge resources accessible directly.


We are very excited to be expanding our support offering with 4me and know that it will ultimately help our community of valued customers and users!

Status pages Admin application
 
Apr-3, 8:19pm EDT

At approximately 8:19PM EDT, StatusCast’s engineers were alerted that some status page and admin applications were inaccessible. The team identified that its hosting partner, Microsoft, was experiencing some issues in its US East region related to app services and SQL databases connections. As of 9:03PM EDT services have been restored and StatusCast’s team is currently working with Microsoft to fully investigate the incident. Once the team has completed it’s investigation we will follow up with an RCA.

At this time StatusCast should be operating fully as expected, if you continue to have any further issues please contact us at support@statuscast.com

 
Apr-3, 9:03pm EDT

As of 9:03PM EDT services have been restored and StatusCast’s team is currently working with Microsoft to fully investigate the incident. Once the team has completed it’s investigation we will follow up with an RCA.

 
Apr-5, 5:00pm EDT

In working with Microsoft, StatusCast’s team confirmed that the disruption was due to an outage with SQL Databases located in Azure’s US East region which is where StatusCast is primarily hosted: 




StatusCast itself was impacted by this outage from approximately 8:19 PM EDT and had fully recovered by 9:03 PM EDT. StatusCast’s team will continue to work closely with Microsoft to further optimize its offering to help ensure that impact of service provider outages is as minimal as possible. 

Status pages Admin application Notification services Third Party Integrations Support Services
 
Apr-2, 9:00am EDT

We are very excited to announce that StatusCast has been acquired by 4Me! Since 2013 we have been working hard to close the gap between service outages and those who are impacted, and this acquisition is one large step further in our journey of providing critical information to those who need it most. 

The inclusion of StatusCast's features will aid 4Me in it's mission to modernize service management for organizations. Click here to read more!

January 13, 9:16am EST
Admin application Notification services
 
Jan-13, 9:16am EST
StatusCast engineers identified a backup in its background processing that resulted in some actions being delayed from being completed in a timely fashion. Once the backup had been identified engineers worked swiftly to correct the issue and to release a patch update to help prevent this type of backup from happening again in the future. At this time StatusCast should be operating fully as expected, if you continue to have any further issues please contact us at support@statuscast.com
Status pages Admin application
 
Jan-5, 10:21am EST

At approximately 9:38AM EST StatusCast engineers detected malicious activity targeting our services. The attack, aimed at overwhelming our service and causing disruptions, was neutralized, ensuring minimal impact on our operations by 9:49AM EST. We will continue to monitor the platform's health and will perform an in-depth investigation to the malicious activity targeted against StatusCast. 

 
Jan-5, 11:14am EST

Services continue to operate as expected. StatusCast's team will provide additional information around this event once our investigation has been fully completed. 

Status pages Admin application
 
Dec-27, 11:00am EST

Earlier today StatusCast's support team received reports of users sporadically getting a 403 "Unauthorized" error when authenticating to the status page and admin portal. Engineers investigated the reports and confirmed that one server in rotation was at fault and have performed an update to resolve the issue. If you continue to receive 403 errors when authenticating please reach out to us at support@statuscast.com

Status pages Admin application Notification services
 
Dec-17, 8:00am EST

The StatusCast team will be performing a maintenance on December 17, 8:00am EST, the estimated duration is 2h. We do not expect any impact to your service but in some cases there may be a brief interruption.

 
Dec-17, 10:00am EST

This maintenance has been completed.

Status pages Admin application
 
Jul-21, 9:40am EDT
At approximately 9:40AM EDT StatusCast engineers were alerted to errors on the application that were preventing users from accessing both their status page as well as the administrative portal. StatusCast’s engineers have determined a potential issue with its service provider Azure and is currently working with Microsoft to diagnose and resolve the issue. 
 
Jul-21, 10:55am EDT

At this time services have been restored and should be operating as normal. If you continue to have any issues please contact support@statuscast.com to open a ticket. We will follow to this event with an RCA detailing what occurred and how we will handle this moving forward. 

 
Jul-21, 12:02pm EDT

Describe the full incident details below:

On July 21st, 2023 at approximately 9:40 EDT StatusCast’s engineers received alerts that the application was displaying a  HTTP Error 500.30 error when attempting to access any *.status.page status page or admin portal. During this period any notifications in progress or from schedule maintenance would have continued to work as expected. Additionally, during this period anyone using StatusCast’s legacy(*.statuscast.com) version of the application was not impacted. 

Describe action taken by StatusCast to mitigate issue:

Engineers immediately began to investigate the cause of the problem. StatusCast’s service provider, Azure, indicated that it was undergoing maintenance in the region that StatusCast’s is primarily hosted on(US East). Engineers got in contact with Microsoft to confirm and to get additional insight as the issue itself was impacting the failover region(US West). During this process StatusCast deployed an additional instance to another Azure region which experienced the same errors as both East and West.

The root cause of the problem ultimately was related to Azure’s maintenance and the availability of one of StatusCast’s databases used for managing connections to the application. Leading up to the outage StatusCast’s operations team was preparing for its monthly penetration test which regularly involves a fresh test database for a reserved test application. The updated connection was not properly propagated to all of StatusCast’s application servers and traffic manager which unfortunately caused the subsequent errors. 

Once the issue had been identified StatusCast’s engineers were quickly able to restore service. StatusCast development team will be performing an emergency patch today(July 21st, 2023) to ensure that an issue like this can be caught without the application going unavailable.