According to a SEC filing by the publicly traded company RackSpace, the power outage that caused its servers to go offline for an extended period last month will cost the company up to $3.5 million USD in refunds. According to the report: “We have experienced power interruptions which have affected a portion of our Grapevine, Texas data center. We have posted updates on our recent power interruption on our website blog and our customer portal for the benefit of our customers. We are continuing to assess the financial impact of service credits due to these events. Currently, our preliminary range for the resulting one time service credits is estimated to be between $2.5 million and $3.5 million. Our website blog is located at http://www.rackspace.com/blog/ .”1
The story of RackSpace’s downtime had a nice run across Twitter & the blogosphere, typical of which is this post by TechCrunch: “Last week, Michael Jackson’s death caused sites to fail left and right. Today, it’s a very different problem. The hosting service Rackspace has been completely down for the past 30 minutes or so… Apparently, it’s an entire network outage [Update below, while it was a massive outage, it wasn't a full outage, apparently.] and so the usually very responsive Rackspace team cannot even respond to emails or tweet (though I’m sure we’ll be seeing some updates from smartphones shortly).”2
I have been personally considering migrating to Mosso, looking at their cloud sites and cloud servers options, as we also had trouble for sometime on our own host. Bottom line is basically every and any host is going to have issues from time to time, when everything is working normally you rarely stop to appreciate how good the service actually is. But most people have zero tolerance for downtime from a web host, as that is the basic fundamental service they are providing as a business. If the downtime issues continue, mass migration quickly ensues. Did the issue hit a crisis point for RackSpace?
From June 30th, 2009 RackSpace blog:
Yesterday afternoon at 3:15CDT our data center in Dallas experienced an interruption in power to portions of the facility. The interruption caused customer servers to lose power and go down. We sincerely apologize for this disruption and know that it impacted our customers’ businesses as well as the experience of many who use the web. Although we have had some issues with this data center before, please know that we will do what it takes to improve its reliability and performance. We owe you an action plan to prevent this type of thing in the future, and we’ll get that to you as soon as it is ready.
Specific to this situation, here’s what we are doing right now:
The data center is currently running on utility power.
We are continuing to research the root cause analysis for yesterday’s generator failures. We have flown in our senior-level engineers from our global operations, and they are working with our external suppliers to determine the cause and how we can prevent this from happening again. We have the best outside experts from companies like Cummins, GE and Eaton.
We have re-serviced and re-checked our UPS units.
Tonight at 9:00CDT we will continue our testing of the generator bank in question as we narrow down the variables to determine and remediate root cause.
Our Support teams will continue to work with all affected customers to ensure they’re up and running.
A copy of the incident report that we sent to affected customers can be found at the following link. Though we typically treat our incident reports as proprietary information between us and our customers, we are publicly posting the report for this incident due to high level of public interest that this incident has received.
I want to ensure you that we are doing everything we can to bring this to resolution as quickly as possible. We appreciate your support and understanding. Our promise is Fanatical Support, we believe in it, and we will work with each of our customers to honor that promise.
Lanham Napier CEO, Rackspace Hosting”3
Have they solved the issue?
RackSpace Blog – July 7, 2009
Dallas data center update as of 1:30 pm CDT
“Today at approximately 11:00 AM, an electrical connection failed, causing a brief power interruption to customers on UPS cluster A. This failure also may have caused intermittent network performance issues for customers supported by UPS clusters B and E for a short time. For cluster A customers, we bypassed the UPS and restored power to the servers via generator within a few minutes. Currently systems supported by UPS cluster A are still running on generator power. Repairs are underway and we plan to return to utility power with UPS support as soon as possible. We will follow up with additional updates as new information becomes available.”3
RackSpace will undoubtedly lose some business because of this, but they have been having massive growth even during the recession, and it seems they are planning to calm down quite a few customers with a refund or account credit. Since the failures seem to be related to mundane issues related to power generation and backup power supplies, and not the cloud server architecture, I would expect the problem to be easily resolved and hopefully no longer an issue in the future.
- http://www.sec.gov/Archives/edgar/data/1107694/000118143109032728/rrd247155.htm [↩]
- http://www.techcrunch.com/2009/06/29/yes-rackspace-is-down-and-so-are-many-of-your-favorite-sites/ [↩]
- http://www.rackspace.com/blog/ [↩] [↩]