Azure

Cloud Providers Azure Azure Frontdoor
 
Oct-29, 12:04pm EDT

Between 15:45 UTC on 29 October and 00:05 UTC on 30 October 2025, customers and Microsoft services leveraging Azure Front Door (AFD) experienced latencies, timeouts, and errors.

The AFD & CDN delivery services continue to be stable and are running as expected. 

Out of an abundance of caution, all service management operations (including creation, updates, deletions, and purges) to AFD via portal or APIs continue to remain temporarily suspended. We are working on a comprehensive plan to gradually reenable these in a phased approach, while ensuring the platform remains stable.

We will continue to share periodic updates as we make progress. Next Update will be by 01:00 UTC on 31 Oct 2025 or sooner.

 
Oct-30, 9:08pm EDT

The Azure Front Door (AFD) and Content Delivery Network (CDN) delivery data planes were fully mitigated at 00:05 UTC on 30 October 2025 and remains at normal operating levels. As we complete our full investigation and implement further safe checks, service management operations (create, update, delete, purge, etc.) are temporarily unavailable. 


We will notify customers once the restriction is lifted, and we will be sending periodic communication on the progress. Next Update will be by 19:00 UTC on 31 Oct 2025 or sooner as event warrant.

 
Oct-31, 4:03pm EDT

Azure Front Door (AFD) and Content Delivery Network (CDN) delivery data planes were fully mitigated at 00:05 UTC on 30 October 2025 and remain at normal operating levels. As we complete our full investigation and implement further safe checks, service management operations (create, update, delete, purge, etc.) are temporarily unavailable.  

 

We are working to provide an ETA on when we can lift the restriction and will continue sending periodic communication on the progress via Azure Service Health. The next update will be by 19:00 UTC on 01 November 2025 or sooner as events warrant.

 
Oct-31, 5:35pm EDT

Azure Front Door (AFD) and Content Delivery Network (CDN) delivery data planes were fully mitigated at 00:05 UTC on 30 October 2025 and remain at normal operating levels. As we complete our full investigation and implement further safe checks, service management operations (create, update, delete, purge, etc.) are temporarily unavailable.  

 

Our current expectation is to lift the restriction on 05 November 2025, and we will continue sending periodic communication on the progress via Azure Service Health. The next update will be by 19:00 UTC on 01 November 2025 or sooner as events warrant.

 
Nov-1, 3:41pm EDT

Azure Front Door (AFD) and Content Delivery Network (CDN) delivery data planes were fully mitigated at 00:05 UTC on 30 October 2025 and remain at normal operating levels. As we complete our full investigation and implement further safe checks, service management operations (create, update, delete, purge, etc.) are temporarily unavailable.  

 

Our current expectation remains to lift the restriction on 05 November 2025, and we will continue sending periodic communication on the progress via Azure Service Health. The next update will be by 19:00 UTC on 02 November 2025 or sooner as events warrant.

 
Nov-2, 3:31pm EST

Azure Front Door (AFD) and Content Delivery Network (CDN) delivery data planes were fully mitigated at 00:05 UTC on 30 October 2025 and remain at normal operating levels. As we complete our full investigation and implement further safe checks, service management operations (create, update, delete, purge, etc.) are temporarily unavailable. 

 

Our current expectation remains to lift the restriction on 05 November 2025, and we will continue sending periodic communication on the progress via Azure Service Health. The next update will be by 03:00 UTC on 03 November 2025 or sooner as events warrant.

 
Nov-2, 4:07pm EST

Azure Front Door (AFD) and Content Delivery Network (CDN) delivery data planes were fully mitigated at 00:05 UTC on 30 October 2025 and remain at normal operating levels. As we complete our full investigation and implement further safe checks, service management operations (create, update, delete, purge, etc.) are temporarily unavailable. 

 

Our current expectation remains to lift the restriction on 05 November 2025, and we will continue sending periodic communication on the progress via Azure Service Health. The next update will be by 03:00 UTC on 03 November 2025 or sooner as events warrant.

 
Nov-2, 10:02pm EST

Azure Front Door (AFD) and Content Delivery Network (CDN) delivery data planes were fully mitigated at 00:05 UTC on 30 October 2025 and remain at normal operating levels. As we complete our full investigation and implement further safe checks, service management operations (create, update, delete, purge, etc.) are temporarily unavailable.

Our current expectation remains to lift the restriction on 05 November 2025, and we will continue sending periodic communication on the progress via Azure Service Health. The next update will be by 19:00 UTC on 03 November 2025 or sooner as events warrant.

 
Nov-3, 7:14pm EST

Azure Front Door (AFD) and Content Delivery Network (CDN) delivery data planes were fully mitigated at 00:05 UTC on 30 October 2025 and remain at normal operating levels. As part of completing our full investigation and safe checks, performing service management operations (create, update, delete, purge, etc.) were made temporarily unavailable. 

Restrictions on service management operations have been removed for this Azure subscription ID. Please allow approximately 30–40 minutes for changes to fully propagate to edge sites. We expect all restrictions across subscriptions, including other subscriptions under your tenant, to be lifted by 05 November 2025. Updates on progress will continue via Azure Service Health. The next update will be provided by 22:00 UTC on 04 November 2025, or sooner if circumstances change.

 
Nov-3, 7:25pm EST

We acknowledge that previous communication may have contained information meant for another list of customers. We apologize for the inconvenience and request you to note the following.

Azure Front Door (AFD) and Content Delivery Network (CDN) delivery data planes were fully mitigated at 00:05 UTC on 30 October 2025 and remain at normal operating levels. As we complete our full investigation and implement further safe checks, service management operations (create, update, delete, purge, etc.) are temporarily unavailable. 

The workflow to manage and rotate Bring Your Own Certificate (BYOC) certificates has been re-enabled. We will monitor the progress of working through the queue of necessary rotations. We will continue to provide regular updates on progress. Our current expectation remains to lift the restriction on 05 November 2025. The next update will be by 22:00 UTC on 04 November 2025 or sooner as events warrant.

 
Nov-3, 10:27pm EST

In preparation for lifting the service management restrictions, an incorrect communication was sent indicating an early removal. Our current date for lifting all control plane restrictions remains 05 November 2025. The next update will be provided by 22:00 UTC on 04 November 2025, or sooner.

 
Nov-4, 5:22pm EST

Azure Front Door (AFD) and Content Delivery Network (CDN) delivery data planes were fully mitigated at 00:05 UTC on 30 October 2025 and remain at normal operating levels. As we complete our full investigation and implement further safe checks, service management operations (create, update, delete, purge, etc.) are temporarily unavailable. 

We continue to hit our milestones toward the expectation to lift this restriction on 05 November 2025. The next update will be sent to confirm once this has happened, or sooner if events warrant.

 
Nov-5, 6:36pm EST

Latest Update: As of 23:00 UTC on 5 November, 2025 restrictions on Azure Front Door (AFD) and Content Delivery Network (CDN) service management operations have been removed. You may now resume normal operations on the AFD service management plane. For enhanced safety and protection, we have extended the configuration propagation times up to 45 minutes. Additional platform enhancements which are underway are expected to optimize the propagation times further.

Previous Context: Azure Front Door (AFD) and Content Delivery Network (CDN) delivery data planes were fully mitigated by 00:05 UTC on 30 October 2025 and remain at normal operating levels. We previously notified you that, as part of completing our full investigation and safe checks, performing service management operations (create, update, delete, purge, etc.) were temporarily restricted. This notification is to confirm that those temporary restrictions no longer apply.

Next Steps: As mentioned in our Preliminary PIR we are completing an internal retrospective to understand the incident in more detail. Once this is completed, which is expected within approximately 7 days, we will publish our final Post Incident Review (PIR) - including a link to register for an Azure Incident Retrospective livestream, discussing the incident and our learnings.

Azure Front Door frequently asked questions (FAQ) | Microsoft Learn

 
Nov-5, 6:42pm EST

 Latest Update: As of 23:00 UTC on 5 November, 2025 restrictions on Azure Front Door (AFD) and Content Delivery Network (CDN) service management operations have been removed. You may now resume normal operations on the AFD service management plane. For enhanced safety and protection, we have extended the configuration propagation times up to 45 minutes. Additional platform enhancements which are underway are expected to optimize the propagation times further.

Previous Context: Azure Front Door (AFD) and Content Delivery Network (CDN) delivery data planes were fully mitigated by 00:05 UTC on 30 October 2025 and remain at normal operating levels. We previously notified you that, as part of completing our full investigation and safe checks, performing service management operations (create, update, delete, purge, etc.) were temporarily restricted. This notification is to confirm that those temporary restrictions no longer apply.

Next Steps: As mentioned in our Preliminary PIR we are completing an internal retrospective to understand the incident in more detail. Once this is completed, which is expected within approximately 7 days, we will publish our final Post Incident Review (PIR) - including a link to register for an Azure Incident Retrospective livestream, discussing the incident and our learnings.

Azure Front Door frequently asked questions (FAQ) | Microsoft Learn

August 20, 6:14am EDT
Status pages Admin application Notification services Third Party Integrations Support Services Cloud Providers Twilio Mailgun SendGrid Azure Network Infrastructure Azure DNS SQL Database Cloud Services App Service \ Web Apps CDN Azure Frontdoor Azure Firewall AutoScale App Service Network Infrastructure AutoScale Data Factory V2 Data Factory Key Vault CDN Azure Firewall Cloud Services Azure DNS
 
Aug-20, 6:14am EDT

An emergency maintenance window is opened effective immediately to ensure continued stability and security of our service.  The service may be unavailable for the next two hours.

 
Aug-20, 8:14am EDT

Our Maintenance services are complete. Services are working as intended.

If you have ever have an issue, you can reach out to our support team via our support portal.

Cloud Providers Azure Network Infrastructure
 
Jan-18, 2:48pm EST

Impact Statement: Starting at 14:12 UTC on 18 Jan 2024, a limited subset of customers in East US may experience short periods of application latency or intermittent HTTP 500-level response codes and/or timeouts when connecting to resources hosted in this region. Internal telemetry indicates that these interruptions are brief and appear in spikes, lasting approximately 2-5 minutes at a time with less than 5 spikes over a 3 hour period.


Current Status: Engineering teams have identified a root cause for this issue and are currently exploring mitigation options. The next update will be provided in 2 hours or as events warrant.

 
Jan-18, 3:04pm EST

Impact Statement: Starting at 14:12 UTC on 18 Jan 2024, customers in East US may experience short periods of application latency or intermittent HTTP 500-level response codes and/or timeouts when connecting to resources hosted in this region. Internal telemetry indicates that these interruptions are brief and appear in spikes, lasting approximately 2-5 minutes at a time with less than 5 spikes over a 3 hour period.

 

Current Status: Engineering teams have identified a root cause for this issue and are currently exploring mitigation options. The next update will be provided in 2 hours or as events warrant.

 
Jan-18, 5:07pm EST

Impact Statement: Starting at 14:12 UTC on 18 Jan 2024, customers in East US may experience short periods of application latency or intermittent HTTP 500-level response codes and/or timeouts when connecting to resources hosted in this region. Internal telemetry indicates that these interruptions are brief and appear in spikes, lasting approximately 2-5 minutes at a time with less than 5 spikes over a 3 hour period.

 

Current Status: Engineering teams have identified a root cause for this issue and are currently exploring mitigation options. We have continued to monitor the status of the service and we can confirm that our telemetry indicates that there have been no additional spikes in the past 2-3 hours. We will continue to monitor and provide an update in 2 hours or as events warrant.

 
Jan-18, 5:19pm EST

Summary of Impact: Between 14:12 UTC and 16:52 UTC on 18 Jan 2024, customers in East US may have experienced short periods of application latency or intermittent HTTP 500-level response codes and/or timeouts when connecting to resources hosted in this region. Internal telemetry indicated that these interruptions were brief and appeared in spikes, lasting approximately 2-5 minutes at a time with less than 5 spikes over a 3 hour period.

 

Current Status: This incident is now mitigated. More details will be provided shortly.

 
Jan-18, 5:50pm EST

Summary of Impact: Between 14:12 UTC and 16:52 UTC on 18 Jan 2024, customers in East US may have experienced short periods of application latency or intermittent HTTP 500-level response codes and/or timeouts when connecting to resources hosted in this region. 

 

Preliminary Root Cause: Engineers observed a sudden increase in traffic to an underlying network endpoint in the East US region. This increase happened in quick spikes(less than 5) over the course of 2-3 hours . When these spikes occurred, customers with resources in the region with network traffic routed through this endpoint may have encountered periods of packet loss and service interruption.

 

Mitigation: Engineers identified and isolated the source of the sudden increases in network traffic.

 

Next Steps: Our team will be completing an internal retrospective to understand the incident in more detail. Once that is completed, generally within 14 days, we will publish a Post Incident Review to all impacted customers. To get notified when that happens, and/or to stay informed about future Azure service issues, make sure that you configure and maintain Azure Service Health alerts – these can trigger emails, SMS, push notifications, webhooks, and more: https://aka.ms/ash-alerts. For more information on Post Incident Reviews, refer to https://aka.ms/AzurePIRs. Finally, for broader guidance on preparing for cloud incidents, refer to https://aka.ms/incidentreadiness.

Cloud Providers Azure
 
Nov-28, 1:32am EST

Summary of Impact: Between 23:19 UTC on 27 Nov 2023 and 02:20 UTC on 28 Nov 2023, you were identified as a customer using Azure Cosmos DB in East US, who may have experienced Service Availability issues while performing Management operations in the Azure Portal or Azure CLI.


Preliminary Root Cause: We determined that this issue was caused due to an inadvertent human error when trying to perform a preventative mitigation.


Mitigation: The mitigation involved restoring the affected configurations to their prior state.


Next Steps: We will continue to investigate to establish the full root cause and prevent future occurrences. Stay informed about Azure service issues by creating custom service health alerts: https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation.

 
Nov-28, 1:41am EST

Summary of Impact: Between 23:19 UTC on 27 Nov 2023 and 02:20 UTC on 28 Nov 2023, you were identified as a customer using Azure Cosmos DB in East US, who may have experienced Service Availability issues while performing Management operations in the Azure Portal or Azure CLI.


Preliminary Root Cause: We determined that this issue was caused due to an inadvertent human error when trying to perform a preventative mitigation.


Mitigation: The mitigation involved restoring the affected configurations to their prior state.


Next Steps: We will continue to investigate to establish the full root cause and prevent future occurrences. Stay informed about Azure service issues by creating custom service health alerts: https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation.

Cloud Providers Azure Network Infrastructure
 
Nov-22, 3:22pm EST

Summary of Impact: At 18:29 UTC, 20:04 UTC, and 20:24 UTC on 22 Nov 2023 for periods of 1 minute, 15 minutes, and 5 minutes respectively, resources in a single availability zone in East US may have seen intermittent network connection failures, delays, packet loss when reaching out to services across the region. However, the retries would have been successful. 


Preliminary Root Cause: While conducting an urgent break-fix repair on some network capacity in a single availability zone of East US, live traffic was impacted. Traffic was re-routed to alternate spans with sufficient capacity. The region is now stable. Customers can continue to operate workloads in the impacted Availability Zone. 


Mitigation: We have moved traffic to healthy optical fiber spans in the region. The issue is mitigated.


Next Steps: We will investigate why this impact occurred to establish the full root cause and prevent future occurrences. Stay informed about Azure service issues by creating custom service health alerts: https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation.

 
Nov-22, 3:23pm EST
Resolved
Cloud Providers Azure
 
Oct-7, 2:47pm EDT

Impact statement: Beginning as early as 11 Aug 2023, you have been identified as a customer experiencing timeouts and high server load for smaller size caches (C0/C1/C2).


Current status: Investigation revealed the cause to be a change in behavior of one of the Azure security monitoring services agent used by Azure Cache for Redis. Monitoring Agent subscribes to the event log and has scheduled backoff for resetting subscription in case no events are received. In some cases scheduled backoff is not working as expected and can increase the frequency of subscription resetting which can significantly affect CPU usage for smaller size caches. Currently, we are in progress of rolling out of the hotfix to the impacted regions which is 80% completed. Initially we estimated this to complete by 13 Oct 2023, however, progress shows we are expected to complete by 11 Oct 2023. To prevent impact till the fix is rolled out we are applying short term mitigation to all caches which will reduce the log file size. The next update will be provided by 19:00 UTC on 8 Oct 2023 or as events warrant, to allow time for the short term mitigation to progress.

 
Oct-7, 2:54pm EDT

Impact statement: Beginning as early as 11 Aug 2023, you have been identified as a customer experiencing timeouts and high server load for smaller size caches (C0/C1/C2).


Current status: Investigation revealed the cause to be a change in behavior of one of the Azure security monitoring services agent used by Azure Cache for Redis. Monitoring Agent subscribes to the event log and has scheduled backoff for resetting subscription in case no events are received. In some cases scheduled backoff is not working as expected and can increase the frequency of subscription resetting which can significantly affect CPU usage for smaller size caches. Currently, we are in progress of rolling out of the hotfix to the impacted regions which is 80% completed. Initially we estimated this to complete by 11 Oct 2023, however, progress shows we are expected to complete by 09 Oct 2023. To prevent impact till the fix is rolled out we are applying short term mitigation to all caches which will reduce the log file size. The next update will be provided by 19:00 UTC on 8 Oct 2023 or as events warrant, to allow time for the short term mitigation to progress.

 
Oct-8, 2:11pm EDT

Summary of Impact: Between as early as 11 Aug 2023 and 18:00 UTC on 8 Oct 2023, you were identified as a customer who may have experienced timeouts and high server load for smaller size caches (C0/C1/C2).


Current Status: This issue is now mitigated. More information will be provided shortly.

 
Oct-8, 2:55pm EDT

What happened? 

Between as early as 11 Aug 2023 and 18:00 UTC on 8 Oct 2023, you were identified as a customer who may have experienced timeouts and high server load for smaller size caches (C0/C1/C2).

 

What do we know so far? 

We identified a change in behavior of one of the Azure security monitoring services agents used by Azure Cache for Redis. Monitoring Agent subscribes to the event log and has scheduled backoff for resetting subscription in case no events are received. In some cases, scheduled backoff is not working as expected and can increase the frequency of subscription resetting which can significantly affect CPU usage for smaller size caches.

 

How did we respond?

To address this issue, engineers performed manual action on the underlying Virtual Machines of impacted caches. After further monitoring, internal telemetry confirmed this issue is mitigated and full-service functionality was restored.

 

What happens next? 

We will continue to investigate to establish the full root cause and prevent future occurrences. Stay informed about Azure service issues by creating custom service health alerts: https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation.

Cloud Providers Azure
 
Aug-18, 4:30pm EDT

Summary of Impact: Between 20:30 UTC on 18 Aug. 23 and 05:10 UTC on 19 Aug. 23, you were identified as customer using Workspace-based Application Insights resources who may have experienced 7-10% data gaps during the impact window, and potentially incorrect alert activations.

Preliminary Root Cause: We identified that the issue was caused due to a code bug as part of the latest deployment which has caused some drop in the data.

Mitigation: We have rolled back the deployment to last known good build to mitigate the issue.

Additional Information: Following additional recovery efforts, we have re-ingested the data that was not correctly ingested due to this event, after further investigation it was discovered that the initial re-ingested data had incorrect TimeGenerated values, instead of the original TimeGenerated value. This may cause incorrect query results which may further cause incorrect alerts or report generation. We have investigated the issue that caused this behavior so future events utilizing data recovery processes will re-ingest the data with the correct, original TimeGenerated value.

If you need any further assistance on this, please raise the support ticket for the same.

 
Aug-18, 4:31pm EDT
Resolved
 
Sep-13, 2:44pm EDT

Summary of Impact: Between 20:30 UTC on 18 Aug. 23 and 05:10 UTC on 19 Aug. 23, you were identified as customer using Workspace-based Application Insights resources who may have experienced 7-10% data gaps during the impact window, and potentially incorrect alert activations.

 

Preliminary Root Cause: We identified that the issue was caused due to a code bug as part of the latest deployment which has caused some drop in the data.

 

Mitigation: We have rolled back the deployment to last known good build to mitigate the issue.

 

Additional Information: Following additional recovery efforts, we re-ingested the data that was not correctly ingested due to this event, after further investigation it was discovered that the initial re-ingested data had incorrect TimeGenerated values, instead of the original TimeGenerated value. This may have caused incorrect query results which may have further caused incorrect alerts or report generation. Our investigation extended past previous mitigation and we were able to identify a secondary code bug that caused this behavior. We deployed a hotfix using our Safe Deployment Procedures that re-ingested the data with the correct, original TimeGenerated value. All regions are now recovered and previously incorrect TimeGenerated values are now corrected.

 

If you need any further assistance on this, please raise the support ticket for the same.

 

Next Steps: We will continue to investigate to establish the full root cause and prevent future occurrence. Stay informed about Azure service issues by creating custom service health alerts: https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation.

Cloud Providers Azure
 
Jul-23, 9:39am EDT

Starting at 07:15 UTC on 23 Jul 2023, a subset of customers using Application Insights workspaces enabled and Azure Monitor Storage Logs may be experiencing intermittent data latency and incorrect alert activation.

Current Status: We are actively investigating this issue and will provide more information within 60 minutes.

 
Jul-23, 9:39am EDT

Starting at 07:15 UTC on 23 Jul 2023, a subset of customers using Application Insights workspaces enabled and Azure Monitor Storage Logs may be experiencing intermittent data latency and incorrect alert activation.

Current Status: We are actively investigating this issue and will provide more information within 60 minutes.

 
Jul-23, 9:39am EDT

Starting at 07:15 UTC on 23 Jul 2023, a subset of customers using Application Insights workspaces enabled and Azure Monitor Storage Logs may be experiencing intermittent data gaps, data latency, and incorrect alert activation.

Current Status: We identified a recent deployment that included a code regression, which caused connectivity issues between some services that make up the data ingestion layer on the platform. We have rolled back the unhealthy deployment to prevent further impact and restore parts of the ingestion layer. To completely restore data ingestion back to normal, we are actively rebooting instances of an ingestion component. We anticipate this mitigation workstream to take up to 4 hours to complete. An update on the status of this mitigation effort will be provided within 2 hours.


 
Jul-23, 9:39am EDT

Starting at 07:15 UTC on 23 Jul 2023, a subset of customers using Application Insights workspaces enabled and Azure Monitor Storage Logs may be experiencing intermittent data latency, and incorrect alert activation.

Current Status: We identified a recent deployment that included a code regression, which caused connectivity issues between some services that make up the data ingestion layer on the platform. We have rolled back the unhealthy deployment to prevent further impact and restore parts of the ingestion layer. To completely restore data ingestion back to normal, we are actively rebooting instances of an ingestion component. We anticipate this mitigation workstream to take up to 4 hours to complete. An update on the status of this mitigation effort will be provided within 2 hours.

 
Jul-23, 9:39am EDT

Starting at 07:15 UTC on 23 Jul 2023, a subset of customers using Application Insights workspaces enabled and Azure Monitor Storage Logs may be experiencing intermittent data latency, and incorrect alert activation.

Current Status: We identified a recent deployment that included a code regression, which caused connectivity issues between some services that make up the data ingestion layer on the platform. We have rolled back the unhealthy deployment to prevent further impact and restore parts of the ingestion layer. To completely restore data ingestion back to normal, we are actively rebooting instances of an ingestion component. We completed more than half of this mitigation workstream, which is anticipated to restore affected services and mitigate customer impact once completed. Customers may begin seeing signs of recovery and resolution of this event is anticipated to occur within 2 hours. An update on the status of this mitigation effort will be provided within 60 minutes.

 
Jul-23, 9:39am EDT

Starting at 07:15 UTC on 23 Jul 2023, a subset of customers using Application Insights workspaces enabled and Azure Monitor Storage Logs may be experiencing intermittent data gaps, data latency, and incorrect alert activation.

Current Status: We identified a recent deployment that included a code regression, which caused connectivity issues between some services that make up the data ingestion layer on the platform. We have rolled back the unhealthy deployment to prevent further impact and restore parts of the ingestion layer. To completely restore data ingestion back to normal, we are actively rebooting instances of an ingestion component. We completed more than half of this mitigation workstream, which is anticipated to restore affected services and mitigate customer impact once completed. Customers may begin seeing signs of recovery and resolution of this event is anticipated to occur within 2 hours. An update on the status of this mitigation effort will be provided within 60 minutes.

 
Jul-23, 9:39am EDT

Starting at 07:15 UTC on 23 Jul 2023, a subset of customers using Application Insights workspaces enabled and Azure Monitor Storage Logs may be experiencing intermittent data latency, and incorrect alert activation.

Current Status: We identified a recent deployment that included a code regression, which caused connectivity issues between some services that make up the data ingestion layer on the platform. We have rolled back the unhealthy deployment to prevent further impact and restore parts of the ingestion layer. To completely restore data ingestion back to normal, we are actively rebooting instances of an ingestion component. We are approximately 855 complete with this final mitigation workstream, which is anticipated to restore affected services and mitigate customer impact once completed. Customers may begin seeing signs of recovery and resolution of this event is anticipated to occur within 60 minutes. The next update will be provided within 60 minutes.

 
Jul-23, 9:39am EDT

Starting at 07:15 UTC on 23 Jul 2023, a subset of customers using Application Insights workspaces enabled and Azure Monitor Storage Logs may be experiencing intermittent data gaps, data latency, and incorrect alert activation.

Current Status: We identified a recent deployment that included a code regression, which caused connectivity issues between some services that make up the data ingestion layer on the platform. We have rolled back the unhealthy deployment to prevent further impact and restore parts of the ingestion layer. To completely restore data ingestion back to normal, we are actively rebooting instances of an ingestion component. We are approximately 85% complete with this final mitigation workstream, which is anticipated to restore affected services and mitigate customer impact once completed. Customers may begin seeing signs of recovery and resolution of this event is anticipated to occur within 60 minutes. The next update will be provided within 60 minutes.

 
Jul-23, 9:39am EDT

Starting at 07:15 UTC on 23 Jul 2023, a subset of customers using Application Insights workspaces enabled and Azure Monitor Storage Logs may be experiencing intermittent data latency and incorrect alert activation.

Current Status: We identified a recent deployment that included a code regression, which caused connectivity issues between some services that make up the data ingestion layer on the platform. We have rolled back the unhealthy deployment to prevent further impact and restore parts of the ingestion layer. We have completed our latest workstream across all instances of the affected ingestion service. A very small subset of instances remains unhealthy, where additional action is ongoing to complete recovery of the ingestion service and mitigate remaining impact. Customers may be seeing signs of recovery. An update on the status of the mitigation effort will be provided within 60 minutes.

 
Jul-23, 9:39am EDT

Starting at 07:15 UTC on 23 Jul 2023, a subset of customers using Application Insights workspaces enabled and Azure Monitor Storage Logs may be experiencing intermittent data gaps, data latency, and incorrect alert activation.

Current Status: We identified a recent deployment that included a code regression, which caused connectivity issues between some services that make up the data ingestion layer on the platform. We have rolled back the unhealthy deployment to prevent further impact and restore parts of the ingestion layer. We have completed our latest workstream across all instances of the affected ingestion service. A very small subset of instances remains unhealthy, where additional action is ongoing to complete recovery of the ingestion service and mitigate remaining impact. Customers may be seeing signs of recovery. An update on the status of the mitigation effort will be provided within 60 minutes.

 
Jul-23, 9:39am EDT

Starting at 07:15 UTC on 23 Jul 2023, a subset of customers using Application Insights workspaces enabled and Azure Monitor Storage Logs may be experiencing intermittent data latency and incorrect alert activation.

Current Status: We identified a recent deployment that included a code regression, which caused connectivity issues between some services that make up the data ingestion layer on the platform. We have rolled back the unhealthy deployment to prevent further impact and restore parts of the ingestion layer. We are progressing with the recovery of the remaining unhealthy service instances, which is estimated to complete within 60 minutes. Customers may be seeing signs of recovery. An update on the status of the service instance recovery effort will be provided within 60 minutes.

 
Jul-23, 9:39am EDT

Starting at 07:15 UTC on 23 Jul 2023, a subset of customers using Application Insights workspaces enabled and Azure Monitor Storage Logs may be experiencing intermittent data gaps, data latency, and incorrect alert activation.

Current Status: We identified a recent deployment that included a code regression, which caused connectivity issues between some services that make up the data ingestion layer on the platform. We have rolled back the unhealthy deployment to prevent further impact and restore parts of the ingestion layer. We are progressing with the recovery of the remaining unhealthy service instances, which is estimated to complete within 60 minutes. Customers may be seeing signs of recovery. An update on the status of the service instance recovery effort will be provided within 60 minutes.

 
Jul-23, 9:39am EDT

Starting at 07:15 UTC on 23 Jul 2023, a subset of customers using Application Insights workspaces enabled and Azure Monitor Storage Logs may be experiencing intermittent data gaps, data latency, and incorrect alert activation.

Current Status: We identified a recent deployment that included a code regression, which caused connectivity issues between some services that make up the data ingestion layer on the platform. We have rolled back the unhealthy deployment to prevent further impact and restore parts of the ingestion layer. Our telemetry shows that ingestion errors have almost returned back to normal and most customers should be seeing signs of recovery at this time. We are continuing to address what remains to be a small number of errors occurring on some ingestion service instances. An update will be provided within 60 minutes, or as soon as mitigation has been confirmed.

 
Jul-23, 9:39am EDT

Starting at 07:15 UTC on 23 Jul 2023, a subset of customers using Application Insights workspaces enabled and Azure Monitor Storage Logs may be experiencing intermittent data latency and incorrect alert activation.

Current Status: We identified a recent deployment that included a code regression, which caused connectivity issues between some services that make up the data ingestion layer on the platform. We have rolled back the unhealthy deployment to prevent further impact and restore parts of the ingestion layer. Our telemetry shows that ingestion errors have almost returned back to normal and most customers should be seeing signs of recovery at this time. We are continuing to address what remains to be a small number of errors occurring on some ingestion service instances. An update will be provided within 60 minutes, or as soon as mitigation has been confirmed.

 
Jul-23, 9:39am EDT

Summary of Impact: Between 07:15 UTC on 23 July 2023 and 00:05 UTC on July 24 2023, a subset of customer using Application Insights workspaces enabled and Azure Monitor Storage Logs may have experienced intermittent data latency, and incorrect alert activation.


This incident is now mitigated. More information will be provided shortly.

 
Jul-23, 9:39am EDT

Summary of Impact: Between 07:15 UTC on 23 July 2023 and 00:05 UTC on Jul 24 2023, a subset of customer using Application Insights workspaces enabled and Azure Monitor Storage Logs may have experienced intermittent data gaps, data latency, and incorrect alert activation.

This incident is now mitigated. More information will be provided shortly.

 
Jul-23, 9:39am EDT

Summary of Impact: Between 07:15 UTC on 23 July 2023 and 00:05 UTC on Jul 24 2023, a subset of customers using Application Insights workspaces enabled and Azure Monitor Storage Logs may have experienced intermittent data gaps, data latency, and incorrect alert activation.


Preliminary Root Cause: We identified a recent deployment that included a code regression, which caused connectivity issues between some services that make up the data ingestion layer on the platform.

 

Mitigation: We rolled back services to previous version to restore the health of the system. It took longer for full mitigation due to various cache layers that required clearing it out so that data can be ingested as expected.

 

Next Steps: We are continuing to investigate the underlying cause of this event to identify additional repairs to help prevent future occurrences for this class of issue. . Stay informed about Azure service issues by creating custom service health alerts: https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation.

 
Jul-23, 9:40am EDT

Summary of Impact: Between 07:15 UTC on 23 July 2023 and 00:05 UTC on July 24 2023, a subset of customer using Application Insights workspaces enabled and Azure Monitor Storage Logs may have experienced intermittent data latency, and incorrect alert activation.

 

Preliminary Root Cause: We identified a recent deployment that included a code regression, which caused connectivity issues between some services that make up the data ingestion layer on the platform.

 

Mitigation: We rolled back services to previous version to restore the health of the system. It took longer for full mitigation due to various cache layers that required clearing it out so that data can be ingested as expected.

 

Next Steps: We are continuing to investigate the underlying cause of this event to identify additional repairs to help prevent future occurrences for this class of issue. Stay informed about Azure service issues by creating custom service health alerts: https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation.

June 26, 9:47am EDT
Cloud Providers Azure Azure Frontdoor
 
Jun-26, 9:47am EDT

Impact Statement: Starting at 06:45 UTC on 26 Jun 2023 you have been identified as a customer using Azure Front Door who may have encountered intermittent HTTP 502 errors response codes when accessing Azure Front Door CDN services.


Current Status: Based on our initial investigation, we have determined that a subset of AFD POPs became unhealthy and unable to handle the load of incoming requests, which in turn impacted Azure Front Door availability.


After a successfully simulation we are currently applying the potential mitigation workstream, by removing impacted instances from rotation while monitoring traffic allocations to remaining healthy cluster. The first set of impacted instances have been successfully removed at 15:30 UTC on 26 Jun 2023. Since the removal we have been seeing a reduction of errors and we will continue to monitor impact. We are currently working on removing the rest of the impacted instances and allocating the resources to a healthy alternative. Some customers may already begin to see signs of recovery. The next update will be provided in 60 minutes, or as events warrant.

 
Jun-26, 9:47am EDT

Summary of Impact: Between 06:45 UTC and 17:15 UTC on 26 Jun 2023 you were identified as a customer using Azure Front Door who may have encountered intermittent HTTP 502 errors response codes when accessing Azure Front Door CDN services.


This issue is now mitigated, more information on mitigation will be provided shortly.

 
Jun-26, 9:48am EDT

Summary of Impact: Between 06:45 UTC and 17:15 UTC on 26 Jun 2023 you were identified as a customer using Azure Front Door who may have encountered intermittent HTTP 502 errors response codes when accessing Azure Front Door CDN services.


Preliminary Root Cause: We found that a subset of AFD POPs were throwing errors and unable to process requests. 


Mitigation: We moved the resources from the affected AFD POPs to healthy alternatives which returned the service to a healthy state.


Next Steps: We will continue to investigate to establish the full root cause and prevent future occurrences. Stay informed about Azure service issues by creating custom service health alerts: https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation. 

How to stay informed about Azure service issues


Cloud Providers Azure Network Infrastructure
 
Jun-10, 4:30pm EDT

Summary of Impact: Between 20:33 UTC - 21:00 UTC on 10 Jun. 2023, customers in East US may have experienced impact on network communications, due to hardware failure of a router, during a planned maintenance. Retries would have been successful.

 

Preliminary Root Cause: We have determined that the router self-healed at 21:00 UTC.

 

Mitigation: We have isolated the device as a precaution and stopped further upgrades.

 

Next Steps: We will continue to investigate to establish the full root cause and prevent future occurrences. Stay informed about Azure service issues by creating custom service health alerts: https://aka.ms/ash-videos for video tutorials and https://aka.ms/ash-alerts for how-to documentation.

 
Jun-10, 4:31pm EDT
Resolved