Monitoring Your Servers

Posted by & filed under .

One of the common tasks of setting up a production environment in your datacentre is setting up monitoring of your servers. This is quite often overlooked (“our application doesn’t have bugs and doesn’t crash!”) until hell breaks loose. At this point the damage is done and 9 times out of 10 is irreparable. (If you get a call from a client telling you your app has been messing up their website and as such their traffic for the last 24 hours its unlikely you’ll see them again!)

The thing is I am yet to find a monitoring system that works really “out of the box” and is easy to configure – and I think that has a lot to answer for the occasional lack of monitoring nowadays. I totally symphatise with sysadmins who are lumped with the task of setting up a monitoring system – because what makes sense to chose? The “best of breed”? Open source? Cobble together some scripts in a scripting language of your choice? Ask developers to change or implement some code in order to provide JMX hooks or SNMP traps? What about historic data? And the list of questions goes on.

I know there are companies who prefer commercial products for the reason of support offered. The irony is that you install a monitoring system to inform you when your system breaks – but you take out a support contract in case the monitoring system itself breaks! Let me spell it out for you: you take your car for a MOT to find any problems with and then pay the guys to fix it and make it road worthy – but in case it still fails the day after the MOT you pay another tax to the same garage so they come and fix it again. How many of you do this as a norm? I bet you if you heard such an offer from your local garage you’d say “no thanks mate” get in your car and drive to the next garage. So I don’t see the point in the support contract on your monitoring system. I see the point in paying for a license and most importantly training for your staff so they know how to configure and use the damn thing – and from my experience the training is a killer in terms of finance as it normally sums around 5k per week per person. And if you have 3-4 sysadmins you’ve coughed up 20k straight away. (ahem surely you’re not gonna tell me that your production environment is managed by one sysadmin or, God forbid!, developers) My point is that from the little that I know licenses for these monitoring systems are comparable with the training costs above or even less – and the training is the vital thing that you need! And training can be provided for any kind of system not just commercial ones – so I don’t see the point in investing in a commercial system really as quite likely you can get the same from a non-commercial one!

Which brings me to the point of open source systems: I know loads of open source purists who will go for nothing but open source on the basis that is maintained by a community which provides support for it as well and are receptive to user’s suggestions and feature requests. Really? 😀 when was the last time you emailed SpringSource or Apache and said “it would help me greatly if you implemented this feature” and they turned around and said “fair enough next month’s release will have it“? Even if they do listen and go ahead and implement your requested feature you probably have to wait somewhere around 6 months to 1 year for the next major release right? By that time you found a workaround and don’t tell me when the release comes out you’re finally gonna ditch all that “workaround code” and spend time on implementing the requested feature – quite likely by that time you’re dealing with other more important things to the business so this gets thrown in the pile of technical backlog and if it gets looked at it’s going to be in another few months ! So I’m yet to be proven that embracing open source technology will actually get you all the monitoring requirements sorted. (Or for that matter any other software requirements!)

There is of course the argument that if its open source that means you got access to the sources – and if you have the sources of the product,well, you can do anything right? No shit, Sherlock! How many people are going to look to a product complex enough so you don’t understand it and say “fair enough we’ll have a look at the sources to see why it goes wrong“? Next to nil. Nada. Zilch. Because if you want to figure out some open source code you need to hire a few people who will make a job out of looking at an open source product and make sense out of it. If an open source source code is easy to read and understand then quite likely it doesn’t do that much – if it does lots then it’s complex and complex means difficult. And if it’s difficult to configure then you can bet your bottom dollar the code is not that simple – and as such not that simple to read! I remember talking to Alfresco a little while back and they said to me that they were putting a lot of work into reaching out to the community so to speak : wiki’s, FAQ’s, forums, they had the whole works! (and I’m sure they still do, they have a great product). However when I started talking to them about the community input into their product codebase the answer that transpired was upon the lines of “sweet fuck all, mate“! There are loads of users who might check out the sources, have a look and have an opinion on it but very few of them would actually contribute! In fact at the time I was talking to them about it they were seeing absolute 0 commits or patches or anything like that from the community.

There are of course companies who will take an open source system, have developers who study the code and adapt it to suit the company needs and transform it to suit their needs. 2 main points here though:

  • first of all they have a dedicated team who does that – if you’re not planning on hiring people specifically for this then don’t even think that you can download the source code of an open source system and adapt it to your needs!
  • secondly, all these companies – who probably want to implement some of the features you need (after all if you have a problem chances are someone else experienced it and wants the same feature or patch) – do not commit back to the open source repository! They either keep the result product as an in-house bespoke solution and don’t want to release it because they fear it might give their competition a alight advantage or they decide to make a commercial product out of it and as such they’ll charge you for the feature! And you went down the open source route to avoid such costs so unlikely you’re gonna cough up and buy it!

I’ve touched on the open source concept before in this blog and looking back at that post I’m afraid to say things haven’t changed that much in this area – and as such like I said don’t expect to be able to pick up an open source monitoring tool, read through the sources and adapt it to your needs! Instead have a look at how many of the feature it has offers and which ones can be easily implemented – for the rest of them you’ll most likely have to consider in-house tools rather than try to modify the source!
And since we are on this topic, it is worth mentioning the monitoring tools built in-house, as from my experience with startups they account for a lot of monitoring of production environments. I know your average enterprise will tut at me for saying this, but in a startup everyone is doing about a hundred jobs at once – and introducing a new (monitoring) system means that number will grow to about 150 – are you really sure you want to add more things on your team’s plate when your main objective is to develop a product? Instead why not leverage their existing knowledge and build some scripts and tools in a language they know already so there’s no learning curve – and also you can easily change these in the future? I’m not even suggesting any technology – there’s Bourne shell, awk, sed, perl, groovy, python and a whole lot of others. If your team is comfortable with any of them why not use that knowledge for not just development but system monitoring as well? On top of that from what I have seen most of commercial or open source systems support plugging in a script – so these can be then easily moved into any other system.

Start small and allow for growth – the basic principle of a startup right? After all just because its open source and free it doesn’t mean its the right one for you – or simply because it costs you money it doesn’t mean someone is going to take the burden of configuring and maintaining it off your shoulders!

One Response to “Monitoring Your Servers”