Tracking Users Online — Part 3

Posted by & filed under , , .

Binary globeI’ve been lucky enough to get some time on my hands to put together a small release of the PixelServer project — and this is the post to accompany it.

If you’re familiar with my posts from this series (see the previous entry here btw), you know that the code is hosted on Github under https://github.com/liviutudor/PixelServer.

This is a minor release, so version number went up from 1.0.0 to 1.0.1 — as such the release tag is pixelserver-1.0.1 and you can fetch this release from Github using : https://github.com/liviutudor/PixelServer/releases/tag/pixelserver-1.0.1 .

One thing about this release — and others to come from now on: the project uses the Spring framework and for most things it uses version 4.0.5.RELEASE. As it turns out, starting with version 4.0, Spring framework relies on the version 3.0 of the Servlet API and since in this project I’m using 2.5, I’ve decided to use version 3.2.3 just for spring-test. This way, this project can still be deployed in Tomcat 6 or other containers not using Servlet API 3.0.

Also, as per note in the README file on Github, starting with this version, the build relies on JDK7 and above since it uses the try-with-resources syntax for loading the pixel image. My recommendation is still that you use Java 8 for this project, but if you can’t, you have to at least use JDK7 from this version.

In This Version

Apart from introducing some unit tests (always good to actually ensure your changes don’t break anything, right? 😉 ) this code introduces only one minor change related to content caching — and below I’m talking a bit about why this was needed.

If you recall in the previous version (1.0.0) we simply serve a 1×1 transparent pixel. That allows us to easily get on any web page without breaking layout or interfering with the colour scheme. The problem is, once we serve the image bytes to the browser, the browser now can decide to cache this — store this data locally and every time our pixel tag is encountered, to save on page loading time, rather than download the pixel again from our server the browser will use the data saved (cached) locally.

That of course is undesireable if you are tracking users online: once the browser caches your pixel then instead of placing a request to your server every time a user views the page, it will load the pixel locally so your server will not know of this visit (and this user!).

To prevent this, the servlet needs to inform the browser that the content it’s sending shouldn’t be cached. This is achieved via a bunch of HTTP headers. I say a bunch, because despite all the standards out there, each browser seems to have its own interpretation, as such, to be on the safe side, a servlet needs to send all the headers for all the browsers — to ensure that any visit, from any browser, will not be cached.

The typical 2 headers you will hear of when it comes to content caching are:

  • Expires: -1 — this declares the content served as already expired at the time served to the browser (thus forcing the browser not to cache it)
  • Cache-control: no-cache — this recommends that the browser doesn’t provide any caching. However, some browsers rely on a slightly different version of this header: Cache-control: must-revalidate. So to cater for both cases, we combine them into: Cache-control: no-cache,must-revalidate

We are adding an extra header in our application to this 2: Pragma: no-cache. This has been deprecated in HTTP/1.1 however, to cater for older HTTP clients it’s best to send this header too.

Once we send these 3 headers to the browser, we are certain that in most cases the browser will not cache the content (there’s probably some old clients, which cater for under 0.1% I dare to guess, which might still ignore these headers but we’re not going to be concerned with those). Which means every single page our pixel is on we are guaranteed to get a call to our server and as such be able to track the user.

As per usual, keep an eye out on Github for future versions and improvements of this project — and of course the post on my blog for each one to explain the changes.