Programming for bandwidth

Posted by & filed under .

Just like many others I do keep an eye on the technical articles and blogs that are out there on the net. I’d like to think that most of the ones I’m reading are quite authoritative on some areas of programming and present some useful insights into the world of IT. Even so, I am yet to discover an article that seriously takes into consideration a main aspect on the internet: bandwidth! Somehow this seems to escape a lot of people, even though its rather crucial; it appears somehow this worry is left to the web designers who should just do their best to compress their images – be them GIF, jpg or whatever. Sure images and static content is an important factor of a website – but since these things are static they will be cached by browsers in most cases which puts back the emphasis on the main webapp.

With the standard internet-based SME having a 100 Mbps connection yet wanting to support “thousands of users” this is an important aspect that needs to be factored in your development so I thought I’d signal put a few issues I’ve come across in my time.

I could talk about gzip encoding but I’m hoping most people know about it and it’s already factored in their configuration – if its not you’d better think seriously about turning it on as it will save you lots of bandwidth when it comes to serving html and other text-based data.

One common mistake that I see a lot is pretty-printing the output. Sure when debugging things on a web page it helps greatly having the source code nicely laid out but this comes with a big cost: every extra space you put in the source would mean an extra byte in the output (for single byte charsets). Doesn’t sound like much does it? But if your system is hammered by 1000 users per second you’re adding an extra KB to your output each second- this means an extra 3.6 MB per hour which means around 75 MB extra per day! maths tell us this means around an extra 2 GB per month. And if your app is using a double byte character encoding you can double that to about 4-5 GB a month. All of this for one extra space in your html! (I accept the argument that with gzip you can probably half those numbers but they still become significant when you’re talking about 50+ extra spaces on your page!) if you happen to be paying for your outbound traffic out of your datacentre then you probably cough up an extra 1-200 bucks a month because of this luxury! Consider instead running your output through a filter regex that “compresses” spaces – even something as simple as \s+ would be a good start.

Another thing that its being forgotten occasionally is the length of the cookies sent back to the user; I’ve seen people totally ignoring this when doing their page load speed measuring etc – most of these tools take into account or report on just the html page (and its components) size – discarding the size of http headers exchanged in between browser and web server. This can be misleading needless to say! If your app sends the user a cookie with say 100 characters in it (some user preferences etc) then remember that this cookie will be presented back to your webserver on each request – including requests made for images , css and JavaScript!assuming an average page has 1 image + your site global css which ensures a uniform look of your website + the global JS library you used (for menus etc?) then you will encounter 4 requests per page – so in the above example of 1000 users a second you are potential seeing 400,000 extra bytes coming into your datacentre per second! That means 400 KB per second coming in – and on top of the data transfer implications it also means your webserver has to process and parse this data and then present it to your web app ! (granted some of these requests will be cached if you have configured your server properly but we are still talking about half of the numbers above !) Don’t forget also that http headers are not subject to gzip encoding as they are sent in clear! and all of this because you stored in your cookie things like "theme=dark;popups=yes" etc! Sure it’s clear to read but if you adopt the convention that your first cookie field represents an index into a layout theme table, second field is a 1/0 representation of whether your user accepts popups or not for instance your cookie becomes "21;1"-which is just 4 characters! Surely I don’t have to explain the benefits of the difference!

On a different note, I have seen in the past extra attributes used on html elements that didn’t need to be there – simply wasting bandwidth again.

Take this example : <img src="..." alt="..." width="..." height="..." />

Sure it looks like a valid tag and in fact it is a perfectly valid html tag – but are all those attributes needed? The width and height are used to instruct the browser about the size of the picture – this helps greatly when laying out a page as it is being loaded as the browser is told before the image is loaded about the dimension so it can prepare an empty space of those dimensions in the page (and based on this arrange the text or other elements around it), so when the image if finally loaded it will just be placed in the already available space. Therefore one would think that those are needed to speed up the page load+lay out time – but that’s not entirely true! If you are only concerned about reserving space you could use only one of the 2 sizes (width of height) and the browser would still reserve some space vertically or horizontally and even more once the image is loaded compute the other dimension based on the picture aspect ratio. You just saved yourself a few bytes by removing the width="..." tag! Same for the alt attribute: it is required by xhtml standards but do you really need it to be that explicit?would alt="img" not suffice?
Anothe thing I’ve seen done a lot is the usage of boolean values true and false being used in the javascript. Sure a statement like

var something=true

gives a very clear indication of the fact that you are declaring a variable that will be used as a flag and it’s initially set. However don’t forget javascript isn’t strong typed and as such anything can be used as a boolean and/or in an if statement. So the following declaration would have exactly the same effect:

var something=1

You “only” saved 3 characters but we’ve learned already what that can mean!
Not to mention that in the light of this discussion all of a sudden variable names in javascript become really important! Is

var myCounter=1

better than

var n=1

?

Sure, you can have name conflicts on page but why not consider using your own namespace or maybe classes to avoid it? (its a trade-off to be made for sure so you need to evaluate how much can you save by switching to these.) Or alternatively use “rare” variable names – for instance how many times have you named a variable other than a, I, j, n, t, x or y? There’re plenty of letters out there that hardly ever get used 😉

favicon.ico – heard of it? 🙂 This mostly applies to server load optimization but as it turns out it also helps with your bandwidth consumption. You probably have all seen those little icons that appear in your browser address bar when you visit a website – it’s supposed to be a mini-identity of your website and promote your brand. That’s fine for a website but what you probably don’t realise is that browsers will request /favicon.ico from every website that is referenced on a web page – so if you have a script running on someone else’s website every time that page is viewed in a browser, the browser will request favicon.ico from your website. The file is typically a couple of hundred bytes and it is worth having one in your web application: if the file is not present then pretty much each time it is requested, your web server will perform a disk access to check the file is there, finds out it’s not and then sends a http 404 back to the server – which means the next time that web browser will encounter a page that references your website it will have to place the same request again and again. Providing a favicon.ico file means that this will be downloaded once by the browser and be cached (by the browser and all proxies the request went through) so first of all your webserver won’t have to perform a disk access each time (a file of that size will most likely be cached in memory by your webserver) and quite likely due to the browser caching you will see less requests coming in – saving you a small amount of bandwidth and also some processing time. And here’s another trick: have an empty file (0 bytes!) for favicon.ico and you’re saving yourself some more bytes per request 😉 (as there’s no specific requirement that says the file cannot be empty – the file will still be cached by browsers!)

robots.txt – another one missed out a lot in the online space. Again, if you’re a publisher (website) you probably welcome every single web spider visit as a visit from a spider means your website gets indexed and as a result of that you are likely to get a higher audience. If you’re not a publisher though and your servers are not storing content then each one of these hits is wasting CPU time and bandwidth. (I’m not gonna go into the whole discussion about what damage it might cause to your SEO but it’s true that this is another side effect.) An average spider hits a website about twice a month and with at least 10 major spiders out there (plus tons more of the little ones) you’re wasting some significant chunk of your bandwidth by letting these bad boys crawl your site. All you have to do is simply set a robots.txt in the root folder of your webserver which disallows all robots access and you saved yourself not just some cpu but some precious bytes per second too.

When it comes to returning javascript from your web application it’s also worth remembering that your JavaScript doesn’t have to be “pretty” once it gets deployed onto the production servers, so one thing worth checking out is the likes of JSMin and maybe produce a version of your javascript that’s been filtered and “minimized” through this app. However there are some issues that jsmin won’t help you with. I have seen for instance very often the following construct:

if ( a == 1 ) ...

Now if you’re testing specifically for variable a to have the value 1 the above is spot on; if however a is an on/off switch following the standard 1/0 convention then it’s equivalent to:

if ( a ) ...

And the opposite is not

if ( a == 0 ) ...

or god forbid

if ( a == false ) ...

but simply

if ( !a ) ...

Have a look again at your if statements – how many times did you make that mistake? 🙂

Another common issue that seems to appear a lot amongst AJAX partisans is the “extreme” usage of XML and nothing but for frontend / backend communication. Sure XML does sometimes have its advantages when used as a communication infrastructure however, if you’re only accessing a URL that on the server side triggers a function that only returns a success / failure marker is there any point in returning something like:
true when really you could just print a "0" or "1" from your server side, run that through a call to eval() on your javascript (or parseInt) and save yourself all that extra bandwidth generated as a result of insisting on XML? (not to mention you’d speed up the client side as well since there’s no more XML parsing and you are downloading just one character!)

And since we are talking XML: is there really a need for explicit (read “long”) tag names when a short one would do? Consider this:

<product name="..." description="..." id="..."/>

Anyone who is looking at this XML exchanged let’s say in between your frontend and backend can tell you are returning the details of a product probably from your inventory database and that the product has a name, description etc. But you’re not anyone! You already know a priori to the call what properties your product supports and as such you don’t need descriptive names for it. Simply put, the following would achieve the same for you:

<p n="..." d="..." i="..."/>

All you need to do is change your javascript XML parsing to look for these shorter tags. (As another side effect this would also shorten a bit your javascript as well as the strings used to store tag names are shorter now too!) And if you couple this with things like representing booleans as 1 and 0 you’ll find you’re saving yourself quite a few bytes per request!

I know I’m gonna hear a few comments here about how changing tag names breaks some jaxb beans bindings or some other XML serialisation mechanism. All I can say is that if your technology is that inflexible then get rid of it and/or write your own! We are not proper developers if all we do is simply glue together pieces of prepackaged code and not write the necessary code require to integrate these smoothly. Or if that’s not an option you favour then take the financial hit of the extra bandwidth (and if your pocket is that large what the fuck are you doing wasting your time reading these? 😮 )

One other aspect that is easily missed out too is filenames for images and css etc. It’s all good and proper using paths and filenames like:

<img src="/images/picture_of_pc.png" alt="" />

Which are quite descriptive however to a browser they are the same as:

<img src="/i/pc.png" alt="" />

Not that easy to read by someone from the outside but that shouldn’t matter to you, since you know the structure and naming convention in your web app! Even more, you can probably keep your /images/ and /css/ directories on your server and even link them to their /i/ and /s/ aliases or configure some internal redirects in your web server (eg via apache’s mod_rewrite). This is transparent to the user and has the exact effect as the long paths/names but saves you once more a few bytes on each request – it all adds up once you start seeing hundreds of users per second!

The last bit I wanted to mention in here though it should have rightly been mentioned in the opening of this article is the presence of comments in javascript and html being output from server side components (as opposed to being embedded in static JS or html files). Comments help us understand the code, it’s a well known fact and the reason why they were created, but don’t forget the fact that it is supposed to help the developers who are dealing with that code not just everyone who looks at your pages! And your developers in most cases would have access to both server side and client side code! Which means that placing your comments in the server side code (so it’s not being output on the actual page) would have the same effect on clarity of the code for your developers. Take this jsp code that produces some html:

<% string s=getUser(); %>
<!-- this div stores the user greeting message -->
<div id="greeting">hello <%= s %></div>

This dumps out a div with some text and also a html comment which explains the purpose of the div. Now have a look at a slightly modified version of this:

<% string s=getUser();
/* this div stores the user greeting message */ %>
<div id="greeting">hello <%= s %></div>

Someone looking at just the output of this will see just the div without the html comment, the developer though who’s looking at the jsp code still sees the comment which is now placed server side and as such no longer consuming bandwidth with no real benefit !

There are more things to consider when programming with bandwidth consumption in mind and quite likely I will follow this post with a few more in time, but hopefully this should get a lot of us to start thinking about this important aspect. Because after all hardware and storage is all cheap and a plenty but not just CPU but bandwidth as well is not!

4 Responses to “Programming for bandwidth”