Just like many others I do keep an eye on the technical articles and blogs that are out there on the net. I’d like to think that most of the ones I’m reading are quite authoritative on some areas of programming and present some useful insights into the world of IT. Even so, I am yet to discover an article that seriously takes into consideration a main aspect on the internet: bandwidth! Somehow this seems to escape a lot of people, even though its rather crucial; it appears somehow this worry is left to the web designers who should just do their best to compress their images – be them GIF, jpg or whatever. Sure images and static content is an important factor of a website – but since these things are static they will be cached by browsers in most cases which puts back the emphasis on the main webapp.
With the standard internet-based SME having a 100 Mbps connection yet wanting to support “thousands of users” this is an important aspect that needs to be factored in your development so I thought I’d signal put a few issues I’ve come across in my time.
I could talk about gzip encoding but I’m hoping most people know about it and it’s already factored in their configuration – if its not you’d better think seriously about turning it on as it will save you lots of bandwidth when it comes to serving html and other text-based data.
One common mistake that I see a lot is pretty-printing the output. Sure when debugging things on a web page it helps greatly having the source code nicely laid out but this comes with a big cost: every extra space you put in the source would mean an extra byte in the output (for single byte charsets). Doesn’t sound like much does it? But if your system is hammered by 1000 users per second you’re adding an extra KB to your output each second- this means an extra 3.6 MB per hour which means around 75 MB extra per day! maths tell us this means around an extra 2 GB per month. And if your app is using a double byte character encoding you can double that to about 4-5 GB a month. All of this for one extra space in your html! (I accept the argument that with gzip you can probably half those numbers but they still become significant when you’re talking about 50+ extra spaces on your page!) if you happen to be paying for your outbound traffic out of your datacentre then you probably cough up an extra 1-200 bucks a month because of this luxury! Consider instead running your output through a filter regex that “compresses” spaces – even something as simple as
\s+ would be a good start.
"theme=dark;popups=yes" etc! Sure it’s clear to read but if you adopt the convention that your first cookie field represents an index into a layout theme table, second field is a 1/0 representation of whether your user accepts popups or not for instance your cookie becomes
"21;1"-which is just 4 characters! Surely I don’t have to explain the benefits of the difference!
On a different note, I have seen in the past extra attributes used on html elements that didn’t need to be there – simply wasting bandwidth again.
Take this example :
<img src="..." alt="..." width="..." height="..." />
Sure it looks like a valid tag and in fact it is a perfectly valid html tag – but are all those attributes needed? The width and height are used to instruct the browser about the size of the picture – this helps greatly when laying out a page as it is being loaded as the browser is told before the image is loaded about the dimension so it can prepare an empty space of those dimensions in the page (and based on this arrange the text or other elements around it), so when the image if finally loaded it will just be placed in the already available space. Therefore one would think that those are needed to speed up the page load+lay out time – but that’s not entirely true! If you are only concerned about reserving space you could use only one of the 2 sizes (width of height) and the browser would still reserve some space vertically or horizontally and even more once the image is loaded compute the other dimension based on the picture aspect ratio. You just saved yourself a few bytes by removing the
width="..." tag! Same for the
alt attribute: it is required by xhtml standards but do you really need it to be that explicit?would
alt="img" not suffice?
You “only” saved 3 characters but we’ve learned already what that can mean!
Sure, you can have name conflicts on page but why not consider using your own namespace or maybe classes to avoid it? (its a trade-off to be made for sure so you need to evaluate how much can you save by switching to these.) Or alternatively use “rare” variable names – for instance how many times have you named a variable other than
a, I, j, n, t, x or
y? There’re plenty of letters out there that hardly ever get used 😉
favicon.ico – heard of it? 🙂 This mostly applies to server load optimization but as it turns out it also helps with your bandwidth consumption. You probably have all seen those little icons that appear in your browser address bar when you visit a website – it’s supposed to be a mini-identity of your website and promote your brand. That’s fine for a website but what you probably don’t realise is that browsers will request
/favicon.ico from every website that is referenced on a web page – so if you have a script running on someone else’s website every time that page is viewed in a browser, the browser will request favicon.ico from your website. The file is typically a couple of hundred bytes and it is worth having one in your web application: if the file is not present then pretty much each time it is requested, your web server will perform a disk access to check the file is there, finds out it’s not and then sends a http 404 back to the server – which means the next time that web browser will encounter a page that references your website it will have to place the same request again and again. Providing a
favicon.ico file means that this will be downloaded once by the browser and be cached (by the browser and all proxies the request went through) so first of all your webserver won’t have to perform a disk access each time (a file of that size will most likely be cached in memory by your webserver) and quite likely due to the browser caching you will see less requests coming in – saving you a small amount of bandwidth and also some processing time. And here’s another trick: have an empty file (0 bytes!) for
favicon.ico and you’re saving yourself some more bytes per request 😉 (as there’s no specific requirement that says the file cannot be empty – the file will still be cached by browsers!)
robots.txt – another one missed out a lot in the online space. Again, if you’re a publisher (website) you probably welcome every single web spider visit as a visit from a spider means your website gets indexed and as a result of that you are likely to get a higher audience. If you’re not a publisher though and your servers are not storing content then each one of these hits is wasting CPU time and bandwidth. (I’m not gonna go into the whole discussion about what damage it might cause to your SEO but it’s true that this is another side effect.) An average spider hits a website about twice a month and with at least 10 major spiders out there (plus tons more of the little ones) you’re wasting some significant chunk of your bandwidth by letting these bad boys crawl your site. All you have to do is simply set a
robots.txt in the root folder of your webserver which disallows all robots access and you saved yourself not just some cpu but some precious bytes per second too.
if ( a == 1 ) ...
Now if you’re testing specifically for variable a to have the value 1 the above is spot on; if however a is an on/off switch following the standard 1/0 convention then it’s equivalent to:
if ( a ) ...
And the opposite is not
if ( a == 0 ) ...
or god forbid
if ( a == false ) ...
if ( !a ) ...
Have a look again at your if statements – how many times did you make that mistake? 🙂
Another common issue that seems to appear a lot amongst AJAX partisans is the “extreme” usage of XML and nothing but for frontend / backend communication. Sure XML does sometimes have its advantages when used as a communication infrastructure however, if you’re only accessing a URL that on the server side triggers a function that only returns a success / failure marker is there any point in returning something like:
true when really you could just print a
"1" from your server side, run that through a call to
parseInt) and save yourself all that extra bandwidth generated as a result of insisting on XML? (not to mention you’d speed up the client side as well since there’s no more XML parsing and you are downloading just one character!)
And since we are talking XML: is there really a need for explicit (read “long”) tag names when a short one would do? Consider this:
<product name="..." description="..." id="..."/>
Anyone who is looking at this XML exchanged let’s say in between your frontend and backend can tell you are returning the details of a product probably from your inventory database and that the product has a name, description etc. But you’re not anyone! You already know a priori to the call what properties your product supports and as such you don’t need descriptive names for it. Simply put, the following would achieve the same for you:
<p n="..." d="..." i="..."/>
I know I’m gonna hear a few comments here about how changing tag names breaks some jaxb beans bindings or some other XML serialisation mechanism. All I can say is that if your technology is that inflexible then get rid of it and/or write your own! We are not proper developers if all we do is simply glue together pieces of prepackaged code and not write the necessary code require to integrate these smoothly. Or if that’s not an option you favour then take the financial hit of the extra bandwidth (and if your pocket is that large what the fuck are you doing wasting your time reading these? 😮 )
One other aspect that is easily missed out too is filenames for images and css etc. It’s all good and proper using paths and filenames like:
<img src="/images/picture_of_pc.png" alt="" />
Which are quite descriptive however to a browser they are the same as:
<img src="/i/pc.png" alt="" />
Not that easy to read by someone from the outside but that shouldn’t matter to you, since you know the structure and naming convention in your web app! Even more, you can probably keep your
/css/ directories on your server and even link them to their
/s/ aliases or configure some internal redirects in your web server (eg via apache’s
mod_rewrite). This is transparent to the user and has the exact effect as the long paths/names but saves you once more a few bytes on each request – it all adds up once you start seeing hundreds of users per second!
<% string s=getUser(); %> <!-- this div stores the user greeting message --> <div id="greeting">hello <%= s %></div>
This dumps out a div with some text and also a html comment which explains the purpose of the div. Now have a look at a slightly modified version of this:
<% string s=getUser(); /* this div stores the user greeting message */ %> <div id="greeting">hello <%= s %></div>
Someone looking at just the output of this will see just the div without the html comment, the developer though who’s looking at the jsp code still sees the comment which is now placed server side and as such no longer consuming bandwidth with no real benefit !
There are more things to consider when programming with bandwidth consumption in mind and quite likely I will follow this post with a few more in time, but hopefully this should get a lot of us to start thinking about this important aspect. Because after all hardware and storage is all cheap and a plenty but not just CPU but bandwidth as well is not!