Add some more extra redundancy again

It’s the season for coughs and sniffles, and last week I took my turn. I went to bed one night with a stuffy nose, and it got me thinking about software.

What’s the connection between sniffles and software, you ask?

Let’s talk redundancy. It’s a familiar technique in software design, but I believe we compartmentalize it too much under the special topic of “high availability”–as if only when that’s an explicit requirement do we need to pay any attention.

Redundancy can be a big deal. Image credit: ydant (Flickr)

Redundancy in nature

Mother Nature’s use of redundancy is so pervasive that we may not even realize it’s at work. We could learn a thing or two from how she weaves it–seamlessly, consistently, tenaciously–into the tapestry of life.

Redundancy had everything to do with the fact that I didn’t asphyxiate as I slept with a cold. People have more than one sinus, so sleeping with a few of them plugged up isn’t life-threatening. If nose breathing isn’t an option, we can always open our mouths. We have two lungs, not one–and each consists of huge numbers of alveoli that does part of the work of exchanging oxygen and carbon dioxide. Continue reading

Encapsulation isn’t just for code

When computer science folks talk about encapsulation, they are usually thinking of how the principle applies to objects and functions inside a codebase. Best practice calls for a separation of concerns–each object responsible for one type of work, hiding all details from its neighbors.

That’s great. But it’s not the only way encapsulation ought to show up in software.

In actual deployment, software packages often manifest anti-patterns in the way that they are configured. A web server has to know all about three different database servers that contribute data for its pages; HA failover scripts must know the identity and responsibility of every actor in the system, as well as many particulars about how these entities use resources to accomplish their tasks.

No wonder our deployments are fragile and high-maintenance…

The cloud computing wave is raising the bar for encapsulation in the way applications–not just objects–discover and interact with one another. In this week’s installment of my series of posts about how to “cloudify”, I discuss how role-based interactions insulate components from details they don’t need to know. It’s encapsulation all over again. And this encapsulation pattern manifests itself in unlikely places–like the order queue at McDonald’s…

What can McDonalds teach a developer of cloud-friendly software? photo credit: phogel (Flickr)

Stay tuned for further installments of this series each Friday. As I said in Part 1, I believe that a competence with cloud–cloud-oriented programming, if you will–will be a checkbox on future tech resumes.

Architects: manage risk like a Vegas bookie

In the world of cloud computing, “risk” is a big buzz word. Lots of analysts are debating how much risk is involved in using SaaS offerings like Salesforce, or hosting corporate applications with a public IaaS provider like Amazon’s EC2. They’re worried about outages (Amazon’s had several ugly ones, most recently for 49 minutes in January), about security, about regulatory compliance, and so forth.

Werner Vogels, Amazon CTO, NextWeb 2008: "Everything fails, all the time."

Werner Vogels, Amazon CTO, NextWeb 2008: “Everything fails, all the time.”

These worries are well founded. However, I pointed out today on Adaptive Computing’s blog that the question “Can I take the risk to use the cloud?” is a bit naive. Sometimes you can just avoid risk altogether. In many cases, however, risk is endemic, and the smart course is to manage it.

How does risk figure in your architectural vision? You should think about it all the time. You should count it, weigh and balance alternative outcomes in ways that would impress even the gaming industry.

Here are 6 key questions to kick-start your pondering:

  • Is my architecture properly accounting for risk of environmental problems such as DDOS, routing failures, brownouts, and temporary loss of an internal component? (See my article about circuit breakers.)
  • When one of my components crashes, will its state be cleanly recoverable (e.g., on transaction boundaries) rather than corrupt? What data loss contract am I targeting?
  • Will it be easy for users or admins to notice when theoretical risks I’ve planned for become true emergencies? How will they be notified?
  • Is it possible to put the system in a “scabbed” state that’s degraded and safe, but functional, while more extensive repairs take place?
  • Am I assuming success too often? (Werner Vogels, Amazon’s CTO, is fond of saying “everything fails, all the time.” That’s on my top 5 list of major insights to remember.)
  • Am I diversifying intelligently, and enabling my customers to do so as well?

Action Item

Make a list of a handful of important risks from your customer’s perspective. How many of them can you help with?