Should Search Engines Break or Fix the Web?

Here’s an interesting thought. So many web pages are only ever accessed as a result of a search engine query, what would happen if search engines stopped including pages that contained broken code?

On the one hand, the most enormous scream from content providers would be heard. On the other hand, there would be a massive push to fix pages with broken code and bring much of the web into line with established public standards. In fact, some information might then be indexed that would have otherwise been unintelligible to the spiders.

And think of the extra impact. No longer will web browser programmers place such high emphasis on maintaining compatibility with broken sources. Web browser development might actually accelerate and release with fewer bugs and smaller download sizes. Software might even become faster.

Google, to take the largest market share holder, even has a site for web masters to receive feedback on their site’s availability for indexing. They really should point out which pages could not be indexed because they had invalid code.

But, I cannot see it happening. Google are doing the reverse and actually going backwards to support the older browsers. Oh well, nice dream.

Advertisements

Amazon S3 Outage (Now Back)

Well I returned to check my giant photos upload that JungleDisk was sending to my Amazon S3 account and it had stopped.

The log showed a whole pile of HTTP error codes which any self-respecting technophile will realise means a serious fault is occurring. The S3 forums document the first errors from 0858PDT although JungleDisk for me reported errors from 1642BST.

There are a few big customers impacted like the photo sharing web site SmugMug who’s displaying an outage page right now and also blogging about the incident. The Amazon Status page does at least confirm what we already know – they’re down and painfully aware of it. Smugmug’s blog says it’s “only” their 3rd outage in over two years which is to be expected. Other major brands will include several Facebook apps loading slowly or displaying errors.

Still, this will hit mainstream press and give cloud computing negative publicity. Hopefully Amazon will learn from this early experiences and continue on the road to virtually bullet-proof hosting. Not many organisations are large enough to put in the resources necessary to build such a robust service and put their brand name against it.

Incidentally, if you have an S3 account, please check their SLA for the procedure to obtain a partial refund…

Updated 2225BST: WordPress.com has broken images due to this, as does Twitter. Amazon report progress toward full restoration of service with internal network communications slowly coming to life.

Updated 2249BST: Amazon are bringing up their S3 web interfaces. Sites and services (like my Jungle Disk backup) should be back up soon. I look forward to their statement on what happened and how they will prevent recurrence.

Updated 2226BST: Amazon S3 EU is back… S3 USA taking a little longer due to larger size.

Updated 0017BST: It’s now Monday and Amazon S3 USA is online once more. Big, big outage.

New arrival

I have recently been looking into architecture of web applications. It follows some interest in Amazon’s EC2 and S3 products where you rent data center resources.

And as if Amazon read my mind, my pre-order of Baron Schwartz’s (et. al) High Performance MySQL (2nd Ed) arrived yesterday (Saturday). It’s certainly a thick book covering all sorts of topics. Hopefully I’ll get a chance to actually read it over the coming days…

PHP Design Patterns

So recently I’ve been refactoring piles of code while providing multiple interfaces to it. It has taught me to separate various layers of logic out and I want to note them down for future reference.

You see, people often read code in books and articles without considering what type of problem the code solves. But understanding this one can author the code in the correct file and provide the most flexible (and proper) interfaces to it.

Take a list. You have a database and it contains a table with some data. Nothing particularly special. You write your first PHP script for it, to view the contents of that table, calling in list.php in your web htdocs folder.

You visit it and it works. You react with surprise that something of yours works first time. Or not.

Now you go modify your script’s output to work via smarty. Separation of PHP code and presentation is good, after all. You include for each record a link to edit.php with ?id=<record_id>.

Then you write edit.php with two things in mind: You want to first display the record as it stands but displayed as a form with editing facilities. On submission of a POST you save the submitted data in its place, or creating a new record if no <record_id> is present.

Wonderful. To top it off, you create a delete.php. Exercise your imagination here please.

Now you have a complete system to list, edit, add and delete your table’s contents.

That table contains some pretty useful data. Later on in your application’s development you need access to that data within another script. Or another web application such as an RSS feed writer needs access to it.

So do you create a second copy of list.php, with modified output? Or use list.php but detect the output expected and display the correct smarty template?

Well, at this point you should consider what business use the list has. When I say business use I do so in the loosest of senses. It could be a blog with no commercial value at all for all I care. Regardless, your application has a use for the list. And now it has two uses, or another application wants to use it.

The business use logic needs to be written between the code that performs the database query, and the code that answers the customer (or blog reader) request and spits out an expected result.

This business logic should be written as a class. I say this for one simple reason. A class can be thought of as a container in its simplest of forms. By putting the business logic in a container, calling the database query, performing any business-level actions, then returning the list of data to the calling script, you are ring-fencing your code allowing it to be used by many other applications.

Now at this point it can all go wrong. Things pivot on your experience as a programmer and as a user of web applications / sites. You might create each method of your business-logic class as static, allowing it to be called as a function utilising the class name as a namespace. Or you might decide to load an instance of your class with environment information and write a constructor and additional methods to cause the application to rebuild the world on each page request.

Neither is right or wrong. But here’s my point: The two decisions I’ve not yet mentioned are:

  1. Application specific behaviours
  2. Fault handling

Application specific behaviours. By this I mean taking the request from the customer (“consumer”) and working out that needs to happen for the expected output to be shipped out. This will involve calling your class, perhaps building an instance of it or obtaining the results of a static method. Either way you now have some data. And because you’ve written that in your list.php you know to send the data to smarty and call your list.tpl file. But it you were in your edit.php file, you would check for <record_id>, then check if the consumer was GETting (reading) the data for possible changes, or POSTing (submitting) a new or changed record, calling your business logic class as appropriate, then picking an entirely different smarty template to ship the data in.

Fault handling. In your list.php if your class told you there were no records you can still display a list.tpl file, and the consumer would show nothing, or the list.tpl could detect this and say explicitly there were no records to view. Consider though that a different type of consumer, another web application, was asking for your list, one that had to use the web to access your data for some reason. You wouldn’t spit back an HTML page as that would be inefficient and not particularly friendly. Instead, some XML or just plain CSV/TXT records could be sent. Now here’s an idea, for such consumers, you could send an HTTP error code instead of sending all those extra bytes.

Now you see where having levels within your code, separated out could be more useful. A script to handle the types of requests, and classes to handle the business logic. The business logic cares not one jot what will happen with your data. Your individual scripts do, and can make decisions line by line themselves.

To most experienced programmers all this is like teaching the Bible to Jesus. But for the rest of your, hope this helps separate your code into better organised files.

TinyMCE and Scriptaculous – Grr

Thought I’d post this as a reminder, having spent an hour staring at web page source code wondering why TinyMCE wasn’t turning my textarea in a full editor for non-technical staff to use.

It seems that if you use Scriptaculous on the same page, TinyMCE must come first in your ordering of script tags, else you get a lovely weird error:

Error: tinyMCE.baseURL has no properties...

Useful. I very nearly chucked something at my TFT screen after that.

Fun with text_field_with_auto_complete

I’ve spent the past few hours trying to get my head around Railstext_field_with_auto_complete. Rails, from a PHP background is totally different paradigm and text_field_with_auto_complete illustrates this more so than usual.

So I have a web site home page /home which has a controller named ‘HomeController’ amazingly enough. I also have a complete MVC ‘WebSite’ for listing of web sites. I wanted a search field on the home page to type in a url and have it autocomplete.

Now it turns out that the solution was incredibly simple, and I used this autocomplete blog entry by Cloves Carneiro Jr as my starting point. But I just had to complicate things unnecessarily. I assumed that the autocomplete box should query the ‘WebSitesController’ as that’s what handles the MCV work for that class. Natural, except I was wrong.

The key was that when the autocomplete makes an AJAX request it HTTP GETs /home/auto_complete_for_model_field or in my specific case, /home/auto_complete_for_web_site_url. At the top of HomeController all I needed was the following line:

auto_complete_for :web_site, :url

I needed no method definition, no view, and no helper. Nothing, zip, nada. Reloaded the page, hit the ‘h’ key and all listed websites appeared. Of course, the next job is to optimise!

Note: Debugging this was made massively easy thanks to FireBug (1.0 beta rocks).

FireBug Beta Screencast

I’ve used FireBug earlier this year to debug some ajax that would have been a real pain otherwise. Now, a beta of the 1.0 release is out and it’s like ten times better. It’s been installed on three machines in the office and that could well increase.

Kudos to the programmers and to the Mozilla chaps who’s platform made it possible.

This FireBug screencast barely touches the surface. I suppose it’s enough though to cause reactions of “wow” which will lead on…