CONTACT US 224 265 0400 or Email

The Agile Incubator Blog

Long live the cloud: Musings on Amazon’s outage

What to make of the spectacular multi-day cloudburst last week when Amazon’s eastern region and all its Availability Zones went down, hobbling hundreds (thousands?) of cloud-based web sites?    It sure got everyone’s attention, didn’t it?  From The Wall Street Journal to the New York Times to minions of bloggers, everybody had a comment on the first major stumble in “cloudom.” 

We at LaunchPoint and our divisions Ajilitee and Discovery Health Partners live in the cloud. We run our business infrastructure on Google and other service providers, and our client applications on Amazon Web Services (AWS), and we lived through this first major test of its durability and reliability.  We say, “long live the cloud!”  Amazon’s cloudburst was a mere spring shower, expected and needed to help cloud flourish.

Our perspective is this: no IT environment has 100% uptime and cloud is no exception.  We just need to accept this reality and architect our disaster recovery plans accordingly.  The Wall Street Journal (4/22/11) cited Forrester analyst Vanessa Alvarez, who said, “Amazon’s taking the hit today because they’re the poster child, but outages aren’t anything new and unfortunately they’re going to continue to happen. “

Architecting disaster recovery plans for the cloud means leveraging the capabilities of the cloud provider and considering the limitations and possibilities inherent in the platform.  Once that’s understood, we can 1) tame  our overblown expectations of cloud as IT nirvana ; 2) stop calling “chicken little” and recognize that our fears that the sky is falling are a bit  overblown;  and  3) realize that we are pioneering a new way to do business and problem-solving challenges as they arise.  

Sure, we lost some productivity in the initial hours after the outage began, and one of our applications, the Discovery Dashboard, part of Discovery Health Partners’ healthcare cost containment platform, encountered an IP address hitch that we were able to fix with an adjustment.  Our disaster recovery plan performed as planned, however, and we didn’t lose any data or suffer a breach.  In the end, the outage was a speed bump, a blip…and a good learning experience. 

The noisy aftermath of Amazon’s cloudburst was in fact more like a spring shower, forcing us all to pause and reexamine, considering our learnings.  We learned how important it is to understand our host and the capabilities they’ve built into the platform to support high availability.    We watch with interest as Amazon learns from this experience and figures out ways to help us, their customer, mitigate risk. Most importantly we gained an appreciation for the potential cloud has to reduce risk further than any other platform, reinforcing our decision to use the cloud to run our business.   Where else can you extend your disaster recovery across geographies with a mix of strategies at a cost point that beats all alternatives?  Our commitment reaffirmed, we say, “long live the cloud.”