On Monday I visited the AWS Popup loft in San Francisco (again) and attended the Bootcamp on “Architecting Highly Available Applications on AWS”. Rather than try and re-summarize it, I’ll let the course description do the talking:
“This beginner-level bootcamp teaches you how to apply the principles of elasticity, mitigating single points of failure and designing loosely coupled applications to architect resilient applications on AWS. We will cover how to apply AWS services and features as well as common design patterns to improve fault tolerance in the networking, web, storage and database layers of your application.”
I’ve spent a fair amount of time working in Amazon Web Services and consider myself fairly savvy so going to an intro class might be a bit strange, but I wanted to gauge the difficultly level myself. At the end of the day I’ll say their “introductory” level classes are fantastic for those
just getting started in the cloud or with AWS. A fair amount of time at the begging of classes was dedicated to why high availability is good, single points of failure (SPOF) are bad, and how elasticity and disaster recovery come in to play as well. The instructor did a very good job of explaining all the sides, including “Amazon is not infallible” which is very important.
About two years ago Josh and I rebuilt www.okta.com to be highly available, using (as it turns out) basically every technique that this class was teaching. That infrastructure ran for 18 months with a better than 99.95% uptime (and most of the “down” time was monitoring errors) with nearly no operator involvement. One of the key pieces was that it was built to be totally automated and highly scalable. If an instance died, AWS Auto-scaling kicked in and replaced with a new instance that would bootstrap itself all the way to fully functional. One of my favorite memories from shortly after it went live was finding out that the CEO was going to be on CNN in just a few minutes. We had no forewarning, but knew immediately that “CEO on TV” = Traffic spike to website. Our action required? Do nothing. We sat there and watched the TV (and the hit counter) and never so much as lifted a finger.
I roughed out a diagram from memory of that design, as shown here:
This is a wildly simplified version, but all the general concepts that the class talked about were there. For comparison, I got the following picture from one of the slides in class:
The comparison is one of the main reason I think these AWS classes are great. They’re teaching, even on an introductory level, real world lessons. Concepts that have been proven time and time again to really work. Every well designed AWS based service probably has a diagram like I posted, reflecting a similar story. The power of the cloud is real and Amazon is willing to teach you, if you give them the time.