Amazon Cloud Runs Low on Disk Space

Another unthinkable (maybe in my mind only) has happened – errors uploading files to S3 led me to the AWS status page which reports the US East Coast facilities running low on drive space.

Am I the only one to have assumed someone or some thing was checking constantly, at least hourly, to ensure a sufficient percentage of drive space is available for use?

Apparently they consumed a whole lot more disk space than expected this past week and they are now feverishly adding more capacity. Surely if capacity can be added within hours they should have been gradually adding more during the week..?

This is actually pretty serious. People’s database backup jobs might be failing due to these issues although admittedly they need to be more resilient than that. But then so does Amazon.

Amazon Cloud Computing Alternatives

So there have been plenty of web sites and services affected by today’s big Amazon S3 outage. Smugmug, Twitter, WordPress.com and JungleDisk amongst the casualties to various degrees. Developers have been venting their frustration at seeing their applications fail because of something they relied on.

So what are the alternatives?

Any CTO will tell you that moving parts are your IT department’s weakest link in reliability terms. If you build a company on a single server will you have more, or less, moving parts that building it on a large computing farm as Amazon provides? Such an absolute measurement is of course a waste of time as that one server of course could die at any moment making you wish you’d relied on the cloud. Yet the cloud may also experience downtime.

Amazon does however have the advantage that it hides it’s redundancy from you. If you were to try to match it, you’d likely end up with RAID, and hot standard servers. Trust me, you don’t want to rely on that scenario without spending time and money testing your backup solutions.

So cloud computing might have occasional outages but at least there are engineers on hand 24×7 to fix them on your behalf. All part of the service, Sir. With your own equipment, you are on-call 24×7 shared with your colleagues. Assuming you have some.

Ultimately money can only buy you the best commercially available solutions. Amazon are not the only cloud computing service providers but as they happen to have financial muscle and experience on their side I would go so far as to say they will likely be the best overall. You mileage may vary, naturally.

Remember, Amazon use commodity hardware under the assuming that bits of their network will fail at random. They have constructed software to operate on top of this in a distributed manner to detect failures and try (as best as their programmers can code) mitigate against issues as they arise. I am sure that once analysed the software will be updated to minimise disruption caused by today’s failure as well as similar ones.

But seriously, even Amazon can only go so far. The human brain can only think up so many scenarios and code so many mitigation rules on. Oh, and testing all these situations can also be a real challenge.

It is still a damned site better than relying on your own company to build a similar system in-house.