You know what happened: Signal stopped working, Slack and Zoom had issues, and most Amazon services were also down, together with thousands of websites and apps, across the globe. The cause was a 14-hour-long AWS outage in the us-east-1 region.
To their credit, AWS posted continuous updates throughout the outage. Three days after the incident, they released a detailed postmortem.
The Multi-Cloud Debate: A Recurring Discussion
In the aftermath of the outage, the hot takes come thick and fast. One in particular always comes up regardless of the culprit and no matter the root cause: go multi-cloud or move back on-prem.
Each organization has to figure that out for themselves, but here’s the reality: you already struggle trying to understand, hire for, and do security in AWS. On its own, AWS has ~430 services, 17,000+ API methods, and 19,000+ IAM permissions. Seems to me adding another cloud provider to the mix is going to make it twice as hard, if not virtually impossible.
Let’s say you choose a “Best of breed” strategy: Amazon S3 for highly scalable storage, Google Cloud Platform for advanced machine learning and data analytics, and Microsoft Azure for integrating with existing Microsoft enterprise applications. Congrats! You’re now multi-cloud!
Is this bulletproof? No it is not.
Best of Breed Strategy Pitfalls
Even in a “best of breed” strategy, you’re likely to lean heavily on one provider’s services for critical workloads. If AWS S3 is your primary storage and it goes down, having GCP for ML doesn’t help unless you’ve built real-time replication to another provider’s storage, which is complex and costly.
First, what is the likelihood of a total AWS outage?
The October 2025 AWS disruption was not a total global outage, but rather a major regional failure centered in the US-EAST-1 (Northern Virginia) region, which is one of AWS’s largest and most critical hubs. While the impact was widespread—disrupting thousands of services like Amazon, Snapchat, Zoom, Fortnite, and Robinhood—it did not constitute a complete shutdown of all AWS services worldwide. Many regions and isolated workloads remained operational.
Likelihood of Complete AWS Outages
Are large-scale, cascading outages affecting multiple regions or core services (like DNS, IAM, or DynamoDB) possible? Yes, but rare. Based on current incidents data, we can expect one approximately every 1–2 years.
Here’s the truth folks—a complete, global AWS outage—where all regions and services fail simultaneously—is extremely unlikely due to AWS’s multi-region, fault-tolerant architecture designed for redundancy and isolation.
In a sense, “multi-cloud” is what AWS and every other provider already has baked into their architecture to begin with.
Total global AWS shutdown? Near-zero likelihood under current cloud architecture design.
Comparing Multi-Region and Multi-Cloud Resilience
Here’s my take on this entire issue: For most organizations, building resilience within a single cloud provider using multi-region architecture is more practical and effective than adopting a full multi-cloud strategy.
Multi-region within AWS (e.g., running workloads in US-EAST-1, EU-WEST-1, and AP-SOUTHEAST-2) provides ~90% of the resilience of multi-cloud with far less complexity, as demonstrated during the 2025 outage when non-affected regions remained operational.
True multi-cloud (using AWS + Azure + GCP) reduces vendor lock-in and increases redundancy but introduces significant challenges: higher management complexity, inconsistent security models, increased costs, and integration overhead.
Most outages are regional or service-specific, not global, so spreading workloads across regions in one provider often suffices.
Only organizations with specific compliance, performance, or risk tolerance requirementstypically benefit from multi-cloud.
A Balanced Approach to Cloud Resilience
My recommendation: Prioritize resilient design in a primary cloud (multi-region, auto-failover, backup) before adding multi-cloud complexity. Use multi-cloud selectively for critical workloads, not as a default strategy.
And this folks, is a wrap.
Do you have a different take on architecting a resilient cloud design? Connect with me on LinkedIn, I would love to hear your thoughts on this very interesting topic.
Navigating the AWS Outage: Why Multi-Region Trumps Multi-Cloud for Resilience © 2025 by George Bakalov is licensed under CC BY-NC-ND 4.0
