Infrastructure frameworks

Something’s been bugging me for the past several weeks if not months. Ever since I heard about cloud computing and Amazon’s EC2 / S3 offerings I’ve become more and more convinced that this is going to take off in a big way. Of course, having the odd outage does not help but hopefully the vendor (Amazon) is learning from its mistakes and will not repeat them.

So what’s been bugging me? Well see I’ve seen things from both a PHP programmer’s perspective and from that of a systems administrator. Admittedly neither perspective has experience of operating sites with huge traffic levels but the high availability of them is extremely important regardless.

High availability means both performance and robustness. Performance can be measured by latency – how fast does the correct response get seen by the user/visitor? Robustness is the ability to restore service following a fault. Faults happen – commonly due to a disk failing or other hardware fault, but sometimes due to software not behaving correctly.

Either way, they are two important factors that cannot be underestimated. On the flip side I’m a great believer of the KISS principal – Keep It Simple, Stupid. The simpler something is the less likely it is (on it’s own) going to go wrong.

That said, something that’s simple may not be able to react to external environment changes which may affect it. Take a web page script that connects to a database. One might keep this simple with PHP using a PDO object to connect to a server IP address and perform a query. No heavy working out of case scenarios to work out whether access controls permit the logged in user to execute the query or anything.

But if the database connection fails, the KISS principal reaches its limitations.

Therefore, we operate some robustness business practices. For the databases, we have a master that replicates database write operations to one or more slave servers. If the master dies or needs to be taken out of service, we connect to one of the slaves and use that as the master instead.

OK so our script now needs to have an IP address adjusted. But surely we should be capable of having this automated such that the scenario can be performed multiple times. After all, the newly promoted master will likely eventually die, too.

And it’s not just database servers that die, anything running on a computer may suddenly be declared unfit for use.

Now we have three problems to deal with:

  1. Hardware assets (computers) that fail. We need to launch and maintain new ones to provide sufficient levels of redundancy.
  2. Software packages operate on these computers. They need to be configured and in some cases have roles. A slave must read from a master, for instance.
  3. Management of the above.

I want to take databases as a case point of a specific area I am thinking about. We systems administrators have set up a master and slave(s) without too much fuss. Traditionally, we PHP coders have listed the master (and perhaps slave) within configuration files that our web scripts have loaded, parsed and connected to on each page load.

I argue that the PHP side of things is in reverse.

Imagine a somewhat more complex yet more robust case scenario. You have three machines in circular replication (“master-master”) called A, B and C. For safety’s sake, your PHP script only ever performs write operations on A, the changes are fed and processed by B then C before returning to A which ignores them because it created the actions originally.

We make our PHP accept a list of possible masters. Well, A, B and C are each possible masters. So if our script cannot successfully connect and execute the query on A, it should repeat the process on B and if that also fails on C. If all three fail you can either try again from A, or fail with an error sent to your output.

That solves one problem – which server is master is no longer a matter of concern for us. But that’s not a very scalable way of dealing with growth.

Imagine you’re a popular site experiencing explosive growth and you need to put your Apache web server log files into a database table. Say you’re seeing one million hits per hour. A lot of hits. You’ve set up a number of web servers and listed each’s public IP address as an A record in your domain’s DNS. Roughly load balanced, each server is generating a fair amount of log file activity. A background process is reading these lines from the files and feeding them into a database table.

And now your database table is getting big. By day’s end you’ve hit 50m records and your backup is going to take all day even while the chosen slave is offlin

We have for the purposes of this example five Apache web servers. For each web server we want to write into a database table of it’s own for logs. Now we need five copies of the servers A, B and C. Each set is called a cluster and each cluster is assigned logically to a web server. Web server #1 then gets configured with one set of three IP addresses, server #2 another set of three IPs, etc.

Now we don’t just store log file data in our databases. We have customer data too, lots of it. But the PHP scripts are the same across our five web servers and handle the same customers. Customer A might use server #2 initially but later on server #5. Both servers need access to that customer’s data. But the data is huge, larger than the log files.

So we need to split out customer data too. For this we need to decide on what to split on. Something like the initial character of the customer’s name or something. The detail of this is irrelevant, what’s important is that it is split to provide better speed. But how does our PHP script know which cluster has what customer data?

At this point I will continue in a further article, but suffice it to say I’m thinking more of a persistent resource providing access to specific data sets within a database cluster.


Amazon Cloud Runs Low on Disk Space

Another unthinkable (maybe in my mind only) has happened – errors uploading files to S3 led me to the AWS status page which reports the US East Coast facilities running low on drive space.

Am I the only one to have assumed someone or some thing was checking constantly, at least hourly, to ensure a sufficient percentage of drive space is available for use?

Apparently they consumed a whole lot more disk space than expected this past week and they are now feverishly adding more capacity. Surely if capacity can be added within hours they should have been gradually adding more during the week..?

This is actually pretty serious. People’s database backup jobs might be failing due to these issues although admittedly they need to be more resilient than that. But then so does Amazon.

Asterisk and Amazon EC2

Given the clear advantages of cloud computing and the industry momentum (slowly) toward VoIP and complementary technologies (think XMPP) I thought it might prove an interesting exercise to install Asterisk on an Amazon EC2 instance.

My preferred operating system is Debian GNU/Linux. Instances are available with Debian (various versions) pre-installed. Theoretically it should be only a few steps to get Asterisk running.

Here’s where reality kick in. Hard. Asterisk has certain features like conferencing that are attractive and in some cases necessary to have. These features require accurate timing as normally provided by hardware except in this case where we actually have a virtual hardware machine with no telephony equipment connected. To provide a timing substitute Zaptel provide the ztdummy kernel driver.

Which means compiling Zaptel against your currently installed Linux kernel. This cannot be done under Debian. The version of the compiler (gcc) is different to that which compiled the kernel. To compile with the correct, older, gcc, you’ll need to boot the OS Amazon used to compile the kernel.

Over to Fedora Core 4 we head. Now, I managed to compile, install and actually run ztdummy on the Amazon developer image, however by this time I’d really had enough. Suffice it to say I was in no mood to start transferring kernel module files across to my Debian instance to pursue the matter.

There are a couple of people who have written up instructions on getting Asterisk to work on EC2. Neither I believe install the ztdummy kernel module. So they are essentially crippled one way or another.

Amazon: If you are listening, let us sysadmins do what we do best. Let us build our O/S including our own Linux kernel! So much time has been wasted due to this restriction!

Amazon Cloud Computing Alternatives

So there have been plenty of web sites and services affected by today’s big Amazon S3 outage. Smugmug, Twitter, and JungleDisk amongst the casualties to various degrees. Developers have been venting their frustration at seeing their applications fail because of something they relied on.

So what are the alternatives?

Any CTO will tell you that moving parts are your IT department’s weakest link in reliability terms. If you build a company on a single server will you have more, or less, moving parts that building it on a large computing farm as Amazon provides? Such an absolute measurement is of course a waste of time as that one server of course could die at any moment making you wish you’d relied on the cloud. Yet the cloud may also experience downtime.

Amazon does however have the advantage that it hides it’s redundancy from you. If you were to try to match it, you’d likely end up with RAID, and hot standard servers. Trust me, you don’t want to rely on that scenario without spending time and money testing your backup solutions.

So cloud computing might have occasional outages but at least there are engineers on hand 24×7 to fix them on your behalf. All part of the service, Sir. With your own equipment, you are on-call 24×7 shared with your colleagues. Assuming you have some.

Ultimately money can only buy you the best commercially available solutions. Amazon are not the only cloud computing service providers but as they happen to have financial muscle and experience on their side I would go so far as to say they will likely be the best overall. You mileage may vary, naturally.

Remember, Amazon use commodity hardware under the assuming that bits of their network will fail at random. They have constructed software to operate on top of this in a distributed manner to detect failures and try (as best as their programmers can code) mitigate against issues as they arise. I am sure that once analysed the software will be updated to minimise disruption caused by today’s failure as well as similar ones.

But seriously, even Amazon can only go so far. The human brain can only think up so many scenarios and code so many mitigation rules on. Oh, and testing all these situations can also be a real challenge.

It is still a damned site better than relying on your own company to build a similar system in-house.

Amazon Amateurs?

According to iehiapk: “I was under the apparently false impression that S3 was a high-availability service.  We may have to evaluate other services now.  This makes us look like a bunch of amateurs.”

I would like to ask precisely what he defines as a “high-availability service”. Five-nines? Sorry, the Amazon S3 SLA says three nines only. If they are in breach of that (which I suspect they might be now although I’ve yet to calculate or read the fine print) your recourse is a partial refund.

Either way, when you sign the service agreement you accept there will be some risk to service and where conditions are met the supplier will compensate you, all documented and accepted when you signed on.

Amazon S3 Outage (Now Back)

Well I returned to check my giant photos upload that JungleDisk was sending to my Amazon S3 account and it had stopped.

The log showed a whole pile of HTTP error codes which any self-respecting technophile will realise means a serious fault is occurring. The S3 forums document the first errors from 0858PDT although JungleDisk for me reported errors from 1642BST.

There are a few big customers impacted like the photo sharing web site SmugMug who’s displaying an outage page right now and also blogging about the incident. The Amazon Status page does at least confirm what we already know – they’re down and painfully aware of it. Smugmug’s blog says it’s “only” their 3rd outage in over two years which is to be expected. Other major brands will include several Facebook apps loading slowly or displaying errors.

Still, this will hit mainstream press and give cloud computing negative publicity. Hopefully Amazon will learn from this early experiences and continue on the road to virtually bullet-proof hosting. Not many organisations are large enough to put in the resources necessary to build such a robust service and put their brand name against it.

Incidentally, if you have an S3 account, please check their SLA for the procedure to obtain a partial refund…

Updated 2225BST: has broken images due to this, as does Twitter. Amazon report progress toward full restoration of service with internal network communications slowly coming to life.

Updated 2249BST: Amazon are bringing up their S3 web interfaces. Sites and services (like my Jungle Disk backup) should be back up soon. I look forward to their statement on what happened and how they will prevent recurrence.

Updated 2226BST: Amazon S3 EU is back… S3 USA taking a little longer due to larger size.

Updated 0017BST: It’s now Monday and Amazon S3 USA is online once more. Big, big outage.

Jungle Disk Monitor

Decided to check out Amazon S3 and it’s practical uses first. This is Jungle Disk.

Jungle Disk is like traditional software in that it is downloaded and run by your desktop computer. It gives a point-and-click interface to select which files and folders to back up and over what schedule (if any). The above screenshot was taken during an (easy-peesy) initial backup of my Documents folder in version 2.02

There are a number of tweaks too such as bandwidth limiting.

Interestingly, you can also “mount” the S3 service as a disk drive. In the above picture I can double-click the JungleDisk icon on the desktop and open my S3 storage account within Mac OS X Finder.

Jungle Disk is available for Linux, too. Which means it will handle a mounted connection to S3 for your servers. Think of the possibilities…