As some of you might know already, I work nowadays in a company that started during the .com era. It’s a new internet media organization, specialized in net contextual advertising. Prior to that though, I used to work in a company offering enterprise solutions for document management. Most of their clients were large-scale companies with an infrastructure that span over thousands of “Rolls-Royce” Sun, HP and IBM servers. Because of this, at the time (even though I doubt that anything has changed in the meanwhile), their products were only targeting OS’s like Windows (NT, 2000 and 2003), Solaris, HP-UX and IBM AIX. Having been a fan of Linux on the server side, I had a few chats with some of my colleagues and managers trying to sell them the idea of having a version of our server software to run on Linux. That didn’t go anywhere as they still perceived at the time Linux as an unstable system.
Of course that my argument against this idea was not going to change their mind so I thought of mentioning at the time that Google is actually running on a set of Linux clusters. Their reply was “Yeah, but nobody will moan if Google goes down for a few days, while we cannot have that with our software.”
Fast forward now to present time and my working for the .com company.
A few times we had hardware problems (old disks “burning” out, network cards stopping working — the usual). In such situations we typically take the server out of a cluster and have the hardware supplier engineers bring replacements in a couple of hours so we can get the server up and running again as soon as possible. Still, for a period of time, while the server is down, our main concern is that we are possibly losing revenue. The arithmetics involved are simple: less servers means we can process less requests ; that means in turn that we lose chances to put adverts on pages; and that in turn means that we are potentially losing clicks — and that’s what brings the revenue in an online advertising company!
So, all of a sudden I come to realise that actually, if Google does go down for a while, there will be loads of people to moan in a few hours if not minutes:
- you will have the advertisers complaining to Google that their advertising campaigns are not delivering — and quite likely asking for their money back;
- the partner sites running Google AdSense will be complaining of the drop in revenue — and more than likely threaten to move to Yahoo, Overture, Doubleclick etc
- (by losing either partner or advertisers Google’s revenue will be dropping, needless to point out) ;
- not having the servers up to serve ads means losing cash for Google, so the investors will be on the phone straight-away
All in all, there’s going to be quite a few people complaining!
So, really, in such a .com environment, your systems have to be up 100% of the time!
Now compare this scenario to the one of an enterprise company: they write a piece of software and they ship it to a client. After the whole installation and implementation team goes in and does the training, the installation etc, the customer starts using it. Few weeks down the line say a bug is being found in the system that prevents the customer from using this software. The typical scenario is as follows:
- Customer reports the bug
- The company analyzes the problem and identifies the piece of code causing it
- They then have a team of programmers fixing it
- Once fixed, a new release of the product goes to testing
- Assuming there are no other bugs found during testing, the product is signed off to go to the customer
- The implementation team finally turns up on site and installs and configures the new version
- Finally the customer can start using the software again!
From my experience, such a cycle never goes under 3-4 days (call it a week). So quite often, you will have the customer not using either the full package or parts of it for about a week. They accept that though, as being part of the life cycle of IT projects and they never moan.
That will never work though for a .com setup. Having your software not delivering for 3-4 days is unacceptable — and if you do decide to do something like that, by the end of those 4 days you might found out that you lost half the advertisers and half the partner sites (at least!) — if not worse: you are out of business!
So going back to the comment that was made to me at the time: no, actually, there will be more “fireworks” about Google being down for an hour than you will get from the likes of BT not having their document management system up for an hour! After all, how many times did you have to call a company to solve out an issue they are having with delivering you the service you are paying for and got prompted by the operator: “Sorry, our systems are down at the moment, can you call back in about 3 hours?” — it’s normal practice for these guys! And it’s all down to the “enterprise” approach.
Imagine how much these companies’ services will improve if these “enterprise” companies take a .com approach to dealing with issues and the problems get fixed in under an hour every time one crops up?