What went wrong with the Microsoft outage in the last week of January

Microsoft reported that an update on a router was behind a huge multi-hour outage affecting the Microsoft Wide Area Network (WAN) that made Azure, Microsoft 365 apps, and Power Platform inaccessible to customers across the globe in late January.

The official report from Microsoft confirmed that users were unable to access multiple Microsoft 365 services including Microsoft Teams, Exchange Online, Outlook, SharePoint Online, OneDrive for Business, Microsoft Graph, PowerBi, Microsoft 365 Admin Center.

Prior to the outage, Microsoft had warned customers that a planned update might cause latency or timeouts. However as South Africans started the day, the update caused more than latency issues and started impacting network devices across the Microsoft WAN, which dropped connections between services in data centers as well as connections on ExpressRoute, Microsoft’s private network for customers to transfer data between data centers thus affecting myriad services.

According to Microsofts preliminary post-incident review, most regions and services had recovered within 2 hours, but some took almost the entire day to fully recover.

Microsoft says it has now “blocked highly impactful commands from getting executed on the devices” to mitigate future occurrences. It’s also now requiring all command execution on the networks devices to follow safe change guidelines.