WordPress Guru & Automattic founder Matt Mullenweg gave a speech at the recent GigaOm’s Structure 09 conference in San Francisco, where he spoke on cloud computing and wordpress.com’s server infrastructure. The speech was reported on by Rich Miller at datacenterknowledge.com. According to the report, Mullenweg said, “The biggest mistake we made with the WordPress.com infrastructure was actually buying servers… (buying servers was ) not a utility. Now we lease them all on a month-to-month basis.” According to the presentation, WordPress.com runs “about 5 million sites serving more than 1 billion page views a month. Automattic uses two data center providers, the dedicated hosting specialists ServerBeach (PEER 1) and Layered Technologies.”
Mullenweg went on to say he viewed WordPress.com’s use of Amazon’s S3 storage system “a failure” because it represented a lack of an open source alternative. “When I have to go to the cloud, I consider that a failure. The thing that’s been most exciting to me is how the open source tools have evolved.” The summary, for those who would like to learn from the experience of running one of the most highly trafficked, multi-user blog sites on the internet? Use leased servers rather than investing in your own data center, that way you can keep on the latest and most up to date boxes without having to cover the cost of the perpetual upgrades, management staff for the data center, and infrastructure / backup facilities.
Month to month leasing also keeps open the negotiation options for a large hosting account like WordPress when dealing with service providers, rather than getting locked into a long term contract or HR staffing issue. This makes a lot of sense. For example, I recently looked into opening a data center in India based around Virident Eco Ram servers, 4 at around $5000 each. Add office space in an Indian IT Park ($400-$500 p/month), backup power supply and power synchronizers ($5000), cooling system ($2500), 24 hour sys admin staff ($36-$40,000 p/year), internet backbone connection ($1250-$2000 p/year?), utilities ($100-$150 p/month), etc. and the total quickly adds up. Granted you likely do not need 24 hour sys admin for 4 servers, but how to maintain a data center without on-site security and maintenance? Compare this total cost to what you would receive over a similar time period with leased servers. Easy math.
This is why Mullenweg ended up advising start-ups, small development companies, and media groups not to try to compete with what the major tech companies are doing with web infrastructure. “My challenge to everyone competing with Amazon, Google and Microsoft is to remember that you’re competing with Amazon, Google and Microsoft,” he said. “These are strong technology companies, and if you’re going to compete with them, open source is the only way to do that. Otherwise, you have no leverage.”
What then is the “open source equivalent” of a cloud server, when one of the main aspects to the cloud is “utility banks” of grid servers that can expand and contract around the traffic needs of an individual site? With shared, leased, or dedicated servers you are renting a fixed amount of disk space or a set number of machines. It is the ability to scale to meet the highest peak demand, digg-effect, etc. that the cloud is delivering through mass, corporate data banks. Unless someone develops a “distributed computing” model of sharing cpu resources across a network to freely scale in times of peak traffic, and many “open source” servers joined together to share resources in this way, I don’t see an exact “open source equivalent” to the grid. Maybe it is Bittorrent, LimeWire P2P networks, but the performance there is a lot different from Akamai or RackSpace.
Mullenweg recommended the nginx web server for load balancing:
“nginx has been running for more than four years on many heavily loaded Russian sites including Rambler (RamblerMedia.com).
In March 2007 about 20% of all Russian virtual hosts were served or proxied by nginx.
According to Google Online Security Blog year ago nginx served or proxied about 4% of all Internet virtual hosts.
2 of Alexa US Top100 sites use nginx.
According to Netcraft in December 2008 nginx served or proxied 3.5 millions virtual hosts. And now it is on 3rd place (not counting in-house Google server) and ahead of lighttpd.
According to Netcraft in March 2009 nginx served or proxied 3.06% busiest sites.
According to Netcraft in May 2009 nginx served or proxied 3.25% busiest sites.
Here are some of success stories: FastMail.FM, WordPress.com.”
Optimizing a dedicated server for a high traffic site and “cloud hosting” are very different undertakings. An open source alternative to this aspect of “the cloud” would involve users sharing their CPU cycles with other users around the world or locally on the same network during peak traffic or spinning off virtual clones of the site during overflow to another machine to handle the load. There would seem to be a number of security issues that would arise, and ultimately, somewhere there has to be charity – people giving up their processing power and bandwidth when it is not being used, sharing empty, allocated disk space – entire websites at the end of a torrent, mirrored on different servers. Right now it is basically the SETI Screensaver as a model of the open cloud.
If you have your own data center or dedicated cluster, Apache Hadoop is an open source distributed option: http://hadoop.apache.org/core/
You may have seen the announcement on twitter (if not then shame on you, you should be following us). WebDevNews has moved to new hosting. As Xavisys Web Development (the company that owns/operates this site) has been growing, we’ve been trying to find just the right company to partner with in order to offer high quality hosting to our clients. Rackspace Cloud Hosting is that company. We tried quite a few ranging from Bluehost to Host Gator to Rackspace. We were not looking for cheap hosting, we were looking for the best hosting we could find for our specific needs. While some were better than others, RackSpace Cloud Hosting was the best.
In order to test out the hosting, we moved a couple sites to the Rackspace Cloud. We didn’t want to move any really big complex sites, but didn’t exactly want to test with my wife’s blog either. Instead, we moved both WebDevNews and Attackr. Both are WordPress (database driven) sites, and combined they average about 650 pageviews a day. While it’s not a lot of traffic, we thought it would give us a good idea of how Rackspace Cloud Hosting would perform.