Careful with Naming in Your Platform

Posted by & filed under , , .

ScriptingFor those of you who keep an eye on my blog, I write quite a bit about technology and about software engineering — that’s because by trade I am a software engineer and quite passionate about a few areas in this segment. I write code and as such I blog a lot about coding. I wrote a lot of code lines in my life and I’ve encountered various things (good and bad) and sometimes I feel I need to share some of my experiences and the knowledge I’ve gained going through them. And today is such a day: I’ve just looked over some code recently, doing a sort of a technical audit for a friend and found myself confused at times about something that is overseen quite often: naming conventions!

And I’ve realised that naming is often perceived as such a small aspect to the point that no one really cares about it that much — and this leads often to confusions and errors. Whether it’s naming of variables in the code, of fields in databases or on reports or even naming of system components, this is hardly ever looked upon as a serious matter. “Just get it done and deploy it so we can move onto the next piece of functionality” seems to be the approach — and as such components such as “email validation engine” make their way into production and code ends up written around it, only to surface later on (typically due to some production outage) that the component doesn’t actually validate email addresses, but instead checks that the string provided “looks like an email address and has an @ and something which looks like a domain“. By that point you find out you have a lot of wrong email addresses of your customers in and you have no way of contacting them now… You get the picture!

I’ve decided as such to share a few of my experiences when it comes to naming “things” in software engineering — hopefully these will help others avoid mistakes made in the past.

First of all, as a coder, one of the things I’ve seen other coders be terrible at is naming variables. I’m not a “code nazi” who insists that when iterating through an array with a piece of code like this:

for( int i = 0; i < array.length; i++ ) {
   // do something with array[i]
   // ...
}

one should name the variable “i” something like “indexInArray” — that is rubbish as the name “i” is often used in maths for indexing series and so on. Also since the variable is only short-lived it’s easy to figure out its purpose by looking at the code. In other words, when it comes to local variables, while it’s nice to have a self-explaining name, it doesn’t matter that much if that is not the case since that naming convention doesn’t affect the outside world (and by “outside” I mean the other pieces of code calling your method.)

I am going to address the issue of naming variables and data members of structures who are referenced throughout the code in more than one place. Here’s an example — a class which encapsulates data about a visitor to a website; I’m sure you can agree there is more to an online visit than a date/time stamp and an IP address but I’m only going to concentrate on those 2 for now, so assume any other data is present in this class:

public class OnlineVisitDetails {
 private Date   timeOfVisit;
 private String clientIpAddress;
 
 public OnlineVisitDetails() {
  this(null, null);
 }
 
 public OnlineVisitDetails(Date timeOfVisit, String clientIpAddress) {
  this.timeOfVisit = timeOfVisit;
  this.clientIpAddress = clientIpAddress;
 }
 
 public final Date getTimeOfVisit() {
  return timeOfVisit;
 }
 
 public final void setTimeOfVisit(Date timeOfVisit) {
  this.timeOfVisit = timeOfVisit;
 }
 
 public final String getClientIpAddress() {
  return clientIpAddress;
 }
 
 public final void setClientIpAddress(String clientIpAddress) {
  this.clientIpAddress = clientIpAddress;
 }
}

So we have our Java bean with our 2 properties — clientIpAddress and timeOfVisit, it follows the standard encapsulation techniques and hides the underlying data by providing getters and setters to the outside. We think the names are pretty explicit: one property gives us the IP address the HTTP request came from and the other one it gives us the data and time the visit took place. We then go ahead and release this into the wild — and maybe we’re nice enough to provide some JavaDoc to explain the meaning of each field so other developers can use this class very easily.

Fast forward a few days/weeks/months, someone notices an oddity with some of the data this class encapsulates: it seems a lot of times we get a lot of visits from the same IP address in a short amount of time. And looking into it it seems it’s definitely not the case of a DDoS attack, these are valid requests. Your investigation reveals that in these cases we are dealing with users behind firewalls (corporate or at home), and basically all the users in these cases will share the same external IP address, which you record in clientIpAddress. Since there is a need to be able to distinguish in between these users/visits, you go and study the HTTP spec and find out there is a HTTP header X-Forwarded-For which gives you the internal IP address, in the internal LAN of the user — thus providing you another point of differentiating in between visits/visitors.

Armed with this piece of knowledge you then go and add a new field to your class and you decide to call it realIpAddress — because after all it is the real IP address assigned to the hardware network device. In other words, if the user runs ifconfig or similar will see this IP address assigned to his device. Sweet, so you change your code to look like this now:

public class OnlineVisitDetails {
 private Date   timeOfVisit;
 private String clientIpAddress;
 private String realIpAddress;
 
 public OnlineVisitDetails() {
  this(null, null, null);
 }
 
 public OnlineVisitDetails(Date timeOfVisit, String clientIpAddress, String realIpAddress) {
  this.timeOfVisit = timeOfVisit;
  this.clientIpAddress = clientIpAddress;
  this.realIpAddress = realIpAddress;
 }
 
 public final Date getTimeOfVisit() {
  return timeOfVisit;
 }
 
 public final void setTimeOfVisit(Date timeOfVisit) {
  this.timeOfVisit = timeOfVisit;
 }
 
 public final String getClientIpAddress() {
  return clientIpAddress;
 }
 
 public final void setClientIpAddress(String clientIpAddress) {
  this.clientIpAddress = clientIpAddress;
 }
 
 public final String getRealIpAddress() {
  return realIpAddress;
 }
 
 public final void setRealIpAddress(String realIpAddress) {
  this.realIpAddress = realIpAddress;
 }
}

You simply add the property realIpAddress and provide an ample JavaDoc to the property to show the differences. And release the code into the wild again. And then shortly after you get a huge number of emails regarding the fact that your platform supplies the wrong data! What happened?

Well to be honest, YOU happened! Or more to the point, your naming happened!

It’s all good specifying in your JavaDoc what’s the difference in between clientIpAddress and realIpAddress and the names totally make sense to you, but to the outside world it doesn’t! Developers will see a property called realIpAddress and assume that is the actual IP address the request comes from — the “real” IP address traffic comes from, and interpret the clientIpAddress as being the IP address of the client — when talking to the website directly or to a proxy! In other words they can interpret your naming the other way around. Imagine now writing code based on this wrong assumptions — you will end up with a lot of issues!

Now imagine if you named your realIpAddress property something like this: deviceInternalAddress or internalLanIPAddress or behindTheFirewallAddress! The name might not be self-explanatory as arguably you think it’s the case with realIpAddress, but exactly because it’s not, the user will go and check the JavaDoc thoroughly and then think “hmm, ok, that sort of makes sense” and keep on using clientIpAddress where they should.

When using names like “realIpAddress” or “clientIpAddress“, we get an idea in our head already what that might mean to us and we write code based on that (wrong) assumption. It’s only when we don’t know that we pay attention to the JavaDoc — or when the names are a bit obscure to require a bit of investigation.

Now here’s another example which I’ve seen creating chaos in a company: naming of a field in the database.

They started with a similar structure as per above — a table which records details about a visit to the website, and amongst many other fields there are 2 fields logging the IP address and the date/time stamp when the visit occurs. These guys were actually smart enough to separate properly the clientIpAddress and internalLanAddress (in fact this one was called in their database internalHardwareDifferentiator — name good enough to prevent anyone from using it in their reports as an IP address), so we won’t concentrate on that bit. Instead, it’s the timeOfVisit field which caused total chaos for a couple of months in this company.

And this is why: initially, the company had only a bunch of servers logging this data in one data center, on the East Coast of USA as it happens. Times were logged at this point in EST as it made perfect sense. After a while, as the company expanded, they found  themselves with a data center on the West Coast of USA as well. At this point, amongst many other changes they operated in their platform, they decided to roll out logging in UTC throughout all of their servers.

They went ahead and put the changes in, kept the field name as timeOfVisit but now they knew that this was actually UTC time. They even went back and updated “old” data to reflect UTC times.They even took into consideration the fact that their customers are now located on both coasts and allowed for customization of timezone per customer such that when running a daily report it ran from 00:00:00am to 23:59:59pm in the customer timezone (rather than UTC — which would see otherwise customers on East Coast running a “daily” report from 7:00:00pm PST to 6:59:59pm PST next day).

And they went live with these changes. Shortly after, their customers wanted to find out more statistics about various activities their visitors perform on their website during various times of day, based on visitor timezone. Simple things like “what time of (their) day do my visitors access the website most?” were difficult to report on — they had to imply the visitor timezone based on their IP, which would give them the geolocation and as such the timezone, however, daylight savings and all sorts had to be taken into account and a report of this kind would generate a huge load on their servers. To simplify this, they decided to log the visitor local time alongside the existing timeOfVisit they already had — this would mean a simple field lookup for reports like the one mentioned above.

So they went back in their database and added a field called localTimeOfVisit — and again, updated data from the past to reflect the date/time in visitor timezone. And this is where it started going haywire for them!

A few weeks into this, they rolled out a platform in their reporting system which allowed clients to create ad-hoc reports, based on a drag-and-drop interface which allowed them to specify any field in their reporting. That was a highly-acclaimed move but their customers, as it allowed them to look at their data whichever way they wanted and everyone celebrated this release.

This was short-lived though as their clients started again to complain about discrepancies and wrong data present in the reporting system. It turned out it was all down to this darn localTimeOfVisit field — the name meant different things to everyone else. To some customers who had a bit of an idea of the data, it meant “date/time of visit in UTC (because I know you guys store data in UTC)”, to others it meant “date/time of visit in MY local timezone”, to others it meant what it is supposed to be mean (date/time of visit in the online visitor local timezone) while to others it simply caused confusion as to “which one of the two timeOfVisit and localTimeOfVisit do I need to use?”.

Due to the high number of customers using this data, the company had to deal with 3-4 months of educating the customers into the meaning of these fields and how to use them — they even went ahead and published some online webinars about how to use their reporting platforms when dealing with dates/times! Overall, it caused havoc and slowed down their operation hugely! Until someone went in the database and changed the field name from localTimeOfVisit to timeReportedByVisitor. At that point the complaints stopped and customers started understanding the meaning of that field and change their reports accordingly.

If only they named the field accordingly in the first place! More than 3-4 months were spent dealing with a poor name like that!

Bottom line: naming doesn’t matter…until it does! It’s a small aspect, for sure, when you’re in a rush to get your next version of the product out there, but if you’re dealing with frameworks and platforms which are public and could end up being used by anyone out there, either make the name as cryptic as possible so your developers have to call you every time they decide to use a field or call a method in your framework, or put a bit of thought in it. Or you can just call all of your fields “fieldOne”, “fieldTwo”… “fieldN”, put a lot of JavaDoc around it, ship it and hope for the best! 🙂