260 views
1 Star2 Stars3 Stars4 Stars5 Stars (3 votes, average: 5.00 out of 5)
Loading ... Loading ...

04.03.12

RMISockFactory — New Open Source Project Started on SourceForge

Posted in Blogroll, News, Tech at 9:22 pm by Liv About Liviu Tudor

I thought I’d post this on my blog as well since this seems to get a lot of attention nowadays (it certainly seems to get a lot of visitors!).

I just started a new open-source project on SourceForge.net — I know you github fans will jump at my throat for using sf.net and not github, but I’m being pragmatic here: I’m still learning my git commands and way of operating, whereas I’m very comfortable with subversion so rather than spending countless hours each day trying to figure out how to merge in changes in git, I thought I’d go with what I know and spend my time on actual coding rather than admin! Makes sense?

Anyway, the reason for this post is because I need some help with the project (as it’s always the case with open source) so I thought I’d raise awareness to my readers of this — who knows? one of you guys might lend a helping hand?

Read the rest of this entry »

Disclaimer

266 views
1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 3.50 out of 5)
Loading ... Loading ...

29.10.11

Optimize the Download Speed of a Web Page: CSS

Posted in Blogroll, Tech at 12:59 pm by Liv About Liviu Tudor

Since I’m going through the old website and resurrecting pages which seem to be still sought after by visitors to my website (I can tell this based on search engines terms they use and what they search for then on my website), I’ve come across this old one, talking about optimizing the loading speed of a web page. I think it’s worth dusting this one off, as I am still seeing this sort of “mistakes” applied to tons of sites out there. This touches also a bit on the bandwidth optimization I wrote about ages ago, so even though it’s an old post I thought worthwhile re-posting this again.

If you have ever used any of the online optimization tools (www.websiteoptimization.com springs to mind) then you must have noticed that the recommended size for a CSS file is 4080 bytes! That’s not too much you would argue, and indeed, on some big sites you might need more than that. The reason for that is so it fits into fewer “higher speed” (basically 3 x 1k TCP/IP packets) which means the browser will finish loading the CSS before it finishes loading the page itself, thus by the time the whole contents of the page becomes available (or even parts of it), the browser will know how to render it already. If the CSS takes longer to load, you will notice that the page might be rendered first with the “default” (read “browser built-in” stylesheet) and then once the CSS is loaded the styles would be applied to the page thus triggering an annoying flickering and possibly a few screen refreshes — not the most pleasant experience for your visitors!

Read the rest of this entry »

Disclaimer

184 views
1 Star2 Stars3 Stars4 Stars5 Stars (4 votes, average: 4.75 out of 5)
Loading ... Loading ...

11.03.11

Bandwidth – reloaded

Posted in Random Thoughts, Tech at 1:59 pm by Liv About Liviu Tudor

I’ve posted before about bandwidth in my posts and I knew from the beginning that it’s one of those issues that you can never exhaust. As it happens recently I came across another interesting thing which is probably worth sharing: cross browser delivery. In brief, it means delivering to each browser the content it can “understand” and more to the point display.

Sure it is easy to deliver the same content to each and every browser – it means you don’t have to do any browser “targeting”, and as such your code is easier to maintain and write. It does mean however you are pumping out quite often a lot of content that browsers will have to ignore as they cannot display it. For instance, some of you might not know this but there are a few text-only browsers in the unix world (mainly lynx or elinks) that have been designed to work for instance in a text console — they have no capabilities of displaying images, flash movies, quicktime etc as typically the container they get launched in doesn’t offer anything but text (think old-style green terminals!). You’d think that there is no point for someone to use these browsers and in most cases you’re correct, but imagine this: you’ve ssh’d into a server and figured out you need to download a patch to some component. You got 2 options:

  1. browse the net on your desktop using a “proper” browser, download the patch locally and then upload it onto the server to apply it
  2. use the likes of lynx on the server directly, google and find the patch, download it right onto your server and apply it

The latter surely is not the most user-friendly experience but it has the advantage of typically being fast: if your server is in a data centre it will quite likely have a fast connection and since it is a server by definition it is more powerful than your desktop machine so more than likely the download will complete in an instant. On the other hand, in the case of the former, the initial download onto your desktop is guaranteed to be slower as it will go through your office internet connection (I don’t know of that many companies that have a gigabit connection from their office!) and on top of that you have to then take the hit again to upload this onto the server through the same (slow) office connection. And if you have to jump through another machine to reach your server you have doubled the hassle for the sole purpose of avoiding using a text-based browser!

Don’t get me wrong, I don’t think anyone in their right mind would actually use one of these browsers all the time – but they do exist and are occasionally used, and as such you’ll probably find you get occasional hits from them. And chances are each time you reply with the same JavaScript block (that gets downloaded but not executed), or flash content, image etc only so the browser can ignore it. On average you’ll probably get about 0.5 to 1 % of your hits I would say based on previous observations, though there are of course variations. That’s not a big percentage but it can add up to big numbers depending on the size of the site your solution is running on: on a site with a daily average of 10,000 page views a day (and this isn’t by any means a big site – 300k page views a month isn’t a big number at all!) about 100 requests will come from these sort of browsers, and with an average response of say 5k (though I’m guessing the reality is higher than that if you have graphics etc as it will probably some up to about 30k or so) you are wasting daily on this site alone 0.5 MB. Doesn’t sound like much, but if you’re dealing with say 20 sites (though again, in reality, if you’re an average advertising solution you’re more likely to deal with at least 100 of them) that means 10MB a day wasted for these browsers alone.

I might have mentioned the next issue before but it’s worth re-itresting it: web spiders! (These only apply if you’re a company providing online services for sites, not if you’re a website yourself, in which case web spider traffic is useful for seo reasons.) Ask yourself: do you really need to return any content to web spiders? Does that bring you any benefits? Would simply return HTTP 200 (OK) with zero length content not suffice? If you are not providing any response to the spiders that will benefit the site (in terms of search ranking etc) than it won’t benefit you in anyway either returning that content! and bearing in mind that an average spidering happens once a week but across the whole website you might be saving yourself some bandwidth! If we consider the same average of 5k per response and a site with about 500 pages (again I’m being very conservative here!) each spider visit would see you waste around 3 MB per web spider a week. If we consider only the major 4 spiders (Google, Bing/Microsoft, Yahoo and Ask) we’re looking at about 12 MB every week per site. Times that 20 (sites) and you got approx 1/4 GB in w week wasted! Add the other 300 MB and rougly 1/2 GB a month are just wasted. And if you’re paying for your outgoing traffic from your datacentre you will probably find you’re wasting some money as well – not to mention the hidden cost of your servers actually processing the requests, entries probably stored in the database and so on.

Another thing to take into consideration is Flash support. With the increasing demands for user interactivity in the advertising space nowadays it’s pretty rare to find non Flash ads on websites. If your solution is involved in this chain of advertising delivery then you will have to start looking at whether the content you are pushing is compatible with the user browser — in this case for instance does it support Flash or does it support the right version of Flash? Obviously in most cases you would employ some JavaScript and with the help of the swfobject library you would find out and have a second hit requesting the right creative for what the browser supports – it is a standard mechanism of delivery of Flash. That is because normally you can’t determine Flash support based on browser headers — but how about the mobile phones traffic? You can have a pretty good idea of what type of mobile it is based on the user agent, right? And most of the handsets don’t support Flash and even though they do support JavaScript it doesn’t make sense to use the same mechanism, knowing upfront that Flash is not supported. Why not instead just deliver upfront the non Flash creative? (I.e. static graphic and link) with your average Flash creative size for a sky scraper around 30+ KB and an image of the same size going around 15 KB (half size) you’d be halving your bandwidth consumption for mobile traffic! With some sites getting up to 20% of their traffic from mobiles if you consider the above example again you’re looking at 2,000 requests a day being mobile traffic. And based on the above figures it means for these requests you’ll be wasting 2000 x 15KB=30,000 KB (30 MB) a day – so more than another half a gig a month.

Come to think of it you might want to check as well whether your solution does need indeed to deliver to mobile handsets? If it doesn’t, just blocking those requests and not returning anything saves you another 600 MB or so! Oh and by the way, that would be per site – multiply this by the 20 sites you’re working with and the figures will start being more substantial!

I’m not going to end here as more than likely there’s more to the bandwidth saga than this – I will however take a break for now. Till next time I write about it, watch your bytes! :D

Disclaimer

239 views
1 Star2 Stars3 Stars4 Stars5 Stars (10 votes, average: 4.60 out of 5)
Loading ... Loading ...

12.10.10

“I hate advertising!”

Posted in Blogroll, Random Thoughts, Tech at 7:47 pm by Liv About Liviu Tudor

Maybe due to the fact that I have been involved throughout the last few years of my life in online advertising, or maybe because of the nature of the people I come in contact with frequently but the above phrase has become a common occurrence in daily conversations. I bet some of you reading this think the same – which doesn’t make you odd at all just makes you part of the audience online advertising is not targeted at. There are lots of users out there who totally blank out advertising found on web pages – same as there are people who don’t pay any attention to a supermarket’s “power aisle” and their latest offers: they just want to walk in, buy exactly the bits they want and walk out as quickly as possible. (As it turns out a lot of times this proves to be more expensive than having a look at the daily offers – but that’s a different argument centred around whether advertising is really useful and that’s not the point of this post so maybe i’ll come back to that in a different post. ) Point here is that there is a (large) segment of the audience which are the usual target for online advertising and another segment who are not the targeted audience. Out of the latter there is another segment who hate online advertising with a vengeance and would do anything they can to stop it. These guys are the ones I’m going to address in this post.

Most of these guys would run all sorts of ad blockers which block any requests going to the likes of google, Kontera and so on. This means the browser intercept all calls that are going to be made from a page to these advertising companies and prevent it at source from happening. So for instance yahoo or any other company they are blocking would never know you actually viewed a page as they’ll never get that ad call. This does quite often leave blank holes in the page, where the ad unit was supposed to be, however the guys running the ad blockers don’t seem to mind that, safe in the “knowledge” they got their revenge on the advertisers. The truth though is exactly the opposite as the advertising companies could not thank these guys enough for their doing! Sounds odd? Well think about it:

Every piece of advertising you see on a page costs an advertising company money. That’s because somewhere in their network a server spent some cpu processing your request – cpu cycles are expensive. And so is the electricity needed to run that server. And the hosting. And the administration, setup, maintenance etc of that server. Plus database costs, caching and so on. Then having processed your request that server needs to serve you a piece of advertising – and that translates to using some bandwidth which is expensive too. And probably some CDN – they don’t come free either! And another expensive part comes from the engagement and click through reports which you are lowering by ignoring that ad on the page: companies are paying top dollar to have their adverts served to the intended/targeted audience: these are the guys who will generate an increase in company profit as a direct result of advertising campaigns. Having paid a fortune for an advertising campaign only to draw a report at the end which tells you that your ad was viewed by millions but never clicked once or clicked but no one actually went to buy anything typically means that advertising company which managed your campaign will never close a deal from you again or even worse they don’t get paid. And by allowing that ad to appear on the page and not interacting with it you are digging the hole pretty much for the company that served that ad – be it Yahoo!, Vibrant Media, Echo Topic or any other.

Now let’s get back to the ad blocker guys: they actually prevent the browser from making the request to the advertiser thus saving the company some cpu cycles, some power, some bandwidth and some storage. And on top of that because the ad was never served it will not skewer their CTR and other statistics – in fact arguably will improve their statistics: 3 clicks for 4 served ads (75% CTR) is better than 3 clicks for 5 ads (60% CTR) which would be the figure if the 5th ad was served to someone who doesn’t click on the ad. (whereas the 75% figure remains constant if the 5th user uses an ad blocker so the 5th request is not “seen” at all by the system.)

Evey time I discuss this with someone I find it useful to bring up the plane seat analogy: there is one empty seat on a plane leaving tomorrow and it’s heading to L.A. and you are the lucky winner of some lottery organised by the airline company. So you get a call in the evening telling you to pack your bags and be in the airport tomorrow as you’ve won. You decide to politely turn down the free ticket offer (after all there’s hotter chicks in New York ;) . At this point the airline company cannot thank you enough: if you take their offer they are losing money by taking you on board – however if you don’t then this seat is now available again for another passenger! This passenger could be another lottery winner like you, a premium customer who would pay lots of money to get on that specific flight and reach L.A. in time for their plastic surgery :D or a regular passenger who will pay a normal price. Or indeed it might be nobody – maybe no one wants to go to L.A. anymore (perhaps they found out Pamela Anderson moved her boobies somewhere else :p). The crucial point is still the fact that by not boarding the plane you increase the probability for the airline company to make money out of that seat!

Now imagine this transposed in the ads world: imagine (and its not that far from reality) that because of number of servers in their infrastructure, bandwidth etc etc the advertising company whose ads you’re blocking can only serve 1000 ads at any moment – if you’re part of those 1000 then the company is losing money; however if you turn your ad blocker on you are giving up your seat in that 1000 and therefore increase the company’s chance of making more money!

So if you really hate advertising, next time you see an ad from Google don’t block it: use the scrolling arrows and make them deliver you a few ads you’re not gonna click on! Or if you see the double underlined keywords from Vibrant Media, Kontera, Echo Topic or whoever else operates in this space nowadays don’t add their domain name to your ad blocker-hover instead over every keyword and make them serve you as many ads as possible on that page – you’re putting a hole in their pocket by doing so!

Disclaimer

105 views
1 Star2 Stars3 Stars4 Stars5 Stars (5 votes, average: 5.00 out of 5)
Loading ... Loading ...

27.08.10

How good is your hosting?

Posted in Tech at 9:25 am by Liv About Liviu Tudor

I’ve seen so many adverts lately online for various hosting packages and co-locaton offers, data centre and so on, each one of them claiming to be the best there is thus raising the question: which one is the best really? While beauty is always in the eye of the beholder, the answer is always relative to the requirements and even more it is influenced by a multitude of factors. Therefore I’m not even gonna attempt to answer this question, but instead signal a certain aspect that gets left out a lot of times: connection rate.

You have become accustomed I’m sure with the various hosting packages ranging from “bronze” to “platinum” or from “small business package” to “enterprise” – the high end of the offers always boasting a lot of disk space and huge bandwidth. As you probably gathered from some of my previous blog entries, this is something that I am interested in and as such I’m only going to concentrate on the bandwidth aspect.

An average “top end” hosting package would probably offer you 100 Mbps or even in some cases up to 1Gbps. For your average website, assuming a page size around 10kb with 3 images averaging 40kb and a stylesheet worth about 10kb (so a total size of around 140-150 kb) this means around 100 * 1,024 / (150 * 8)= approx 85 page views per second. This is probably acceptable for most sites, even more so as figures are approximate and based on 100 Mbps lines. So for your average site their bandwidth requirements are covered and the servers are ticking along nicely.

What if your servers are not serving “simple” web pages though, what if you are a service provider? (eg ads, visitor tracking etc) You will most likely have lots of requests per second but probably hardly serve that much “content” for each request. For instance if your business is tracking user behaviour online you would base a lot of that on tagging various sites with the standard 1X1 transparent pixel and base a lot of your tracking on cookies. As such your standard reply to each request is probably around 1kb (probably even less than that but I’m being generous ;) ). On a 100Mbps link this means you could serve in theory 100*1024/8 = 12,500 requests per second! I’m gonna approximate that to 10,000 requests per second just in case my 1kb per request was not that accurate. Now first of all if you’re thinking of taking that much traffic I’m guessing you got the servers to support it – you cannot take this traffic on one single server; to process 10,000 requests per second it means that you are taking 0.1 msec per request and your server has enough cores to sustain around 100-200 true concurrent threads… Possible, but unlikely for most of us running our software on clustered commodity hardware. Anyway to get to the point : whether you’ve got some giant machine or a cluster than can process 10,000 requests a second, how does your firewall cope?

Don’t be surprised to have everything ready in terms of application architecture and implementation, server hardware and security only to find out that the firewall provided by your ISP only copes with “normal” traffic (eg 100 to 200 users connected simultaneously downloading content that can reach the 100Mbps bandwidth). There are lots of low end routers now that can cope even with 1Gbps bandwidths but fall on their knees when the number of concurrent connections go past 2-3,000 per second. To cope with that amount of connections you will find you need the high end routers – pretty much ISP level routers – and you will find in most cases your “standard” hosting package probably don’t include it. In fact it won’t even be mentioned – I am yet to find a hosting company that will offer a hosting package with X Gb dusk space, Y Mbps bandwidth and Z concurrent connections! Your average company doesn’t worry about that and as such nor do ISP’s. For the rare cases like the one described above you might find that your ISP says “sorry no can do” or they will tell you that this is not part of their standard package and offer you a similar “specialised” package which offers same bandwidth and everything else and also can cope with that volume of connections per second. And quite likely costs you another 5-10k a year :D That added cost in most cases is to recover the cost of the high end router mentioned.
You might wonder why do companies not offer this option up front? Well don’t forget that these router have processors and memory and an OS themselves and they do a lot of processing with these; however whereas the standard processing you are used to in computers involve crunching numbers and database transactions, a router processes a lot of IP packets: it has to decide whether this is part of an existing session, where does it need to be sent, what are the best routes to get there, does it match the security policies and ACL’s set and so on. More traffic means more CPU activity so to cope with lots of packets you need a reasonable processor, however more concurent sessions mean the router has to maintain larger in memory structure which will enable it to match packets to existing sessions. Since these structure are going larger that means more lookup operations and that means faster cpu again! A hosting company will provide a regular router which copes with standard traffic for the same reason they offer basic hosting packages: what is the point in offering lots of disk space and cpu and bandwidth when a lot of clients don’t use that much? It’s nothing short of obscene having a quad core server running just an instance of Apache serving simply static pages (even though it is such a common occurrence nowadays!). To provide more power, disk etc means to increase the costs, which is not justified for those clients that don’t need the extras. Similarly, to offer a high end router from the start means a hefty price increase – for something that most clients won’t use (apart from you!), and as such it doesn’t make sense commercially.

So before signing up for a new hosting package with a company and start budgeting around the figure they give you, might be worth reviewing your app connections per second and bandwidth requirements, also review the worse case scenario spikes of traffic and communicate those to your hosting provider – quite likely the price will change, but at least you can address that budget change very early in the project and deliver your solution on time and with no nasty surprises.

Disclaimer

« Previous entries Next Page » Next Page »