Wednesday, June 2, 2010

A Disturbance in the Cloud

Did you feel it? There was a disturbance in the cloud on June 1st. Hosting.com went down for about 45 minutes. Hosting.com hosts over 65,000 web sites and is a growing cloud computing provider.



"Apparent Networks' Cloud Performance Center (www.apparentnetworks.com/cpc) recently confirmed Hosting.com experienced connectivity loss which caused an outage in their Newark, New Jersey, data center," Apparent reports. "The outage occurred on June 1, 2010, beginning at approximately 15:45PM and ending 8:29PM EDT. There were intermittent periods of connectivity with high data packet loss between those times, and the number of connectivity loss events and duration varied slightly by location. According to Hosting.com's Twitter feed (http://twitter.com/HDCOps), "One dedicated switch failed. It failed over to a second switch which crashed as well."

Hosting.com (H.C) is still trying to figure out what happened and are trying to reproduce it. H.C has a lot of the things that you would look for in a cloud provider for regarding business continuity. As you can see from the Apparent Networks report, H.C has redundant network equipment, and the web site shows that they have constant monitoring, redundant power, and multiple network providers among other things. So why did they go down? They may never be able to answer that but it looks like it may have been a software bug in the switch from Cisco.

The lesson here is that nothing is 100% up, there are always chinks in the armor and cloud providers are still learning. What’s important is how hosting.com reacted and how fast. Forty-five minutes is a long time and given the data that I have, I don’t know if that was quick response or not. H.C had a double failure of a redundant system (like two engines on a twin-engine aircraft), so they had their hands full. As a customer you should demand open and honest communication about the event and any subsequent remediation (like installing two different vendor switches to mitigate issues like this).

Given the size and scale that H.C has, I’d bet that their up-time is still better than most small to medium businesses but there are no guarantees for this when you go to the cloud.
- Chris Claborne

No comments:

Post a Comment