Architects: manage risk like a Vegas bookie

In the world of cloud computing, “risk” is a big buzz word. Lots of analysts are debating how much risk is involved in using SaaS offerings like Salesforce, or hosting corporate applications with a public IaaS provider like Amazon’s EC2. They’re worried about outages (Amazon’s had several ugly ones, most recently for 49 minutes in January), about security, about regulatory compliance, and so forth.

Werner Vogels, Amazon CTO, NextWeb 2008: "Everything fails, all the time."

Werner Vogels, Amazon CTO, NextWeb 2008: “Everything fails, all the time.”

These worries are well founded. However, I pointed out today on Adaptive Computing’s blog that the question “Can I take the risk to use the cloud?” is a bit naive. Sometimes you can just avoid risk altogether. In many cases, however, risk is endemic, and the smart course is to manage it.

How does risk figure in your architectural vision? You should think about it all the time. You should count it, weigh and balance alternative outcomes in ways that would impress even the gaming industry.

Here are 6 key questions to kick-start your pondering:

  • Is my architecture properly accounting for risk of environmental problems such as DDOS, routing failures, brownouts, and temporary loss of an internal component? (See my article about circuit breakers.)
  • When one of my components crashes, will its state be cleanly recoverable (e.g., on transaction boundaries) rather than corrupt? What data loss contract am I targeting?
  • Will it be easy for users or admins to notice when theoretical risks I’ve planned for become true emergencies? How will they be notified?
  • Is it possible to put the system in a “scabbed” state that’s degraded and safe, but functional, while more extensive repairs take place?
  • Am I assuming success too often? (Werner Vogels, Amazon’s CTO, is fond of saying “everything fails, all the time.” That’s on my top 5 list of major insights to remember.)
  • Am I diversifying intelligently, and enabling my customers to do so as well?

Action Item

Make a list of a handful of important risks from your customer’s perspective. How many of them can you help with?