Arrays in Java

Posted by & filed under , , , .

I seem to be on a roll today so I thought I’d mention this little tip that I have come across in the past when using arrays — more specifically bi-dimensional arrays. We think of such arrays (in Java and other languages as well!) quite often as a bit of a Scrabble board where we store data. So for instance, in the case of a 5×5 int array, most peopleĀ  would represent this in their mind as a Scrabble board with 5 rows and 5 columns where we can store integer numbers. On a conceptual level that’s fine, however, the implications regarding the memory storage are a bit different — and it’s worth keeping them in mind when dealing with large arrays (for small size arrays — e.g. 5×5 as in the above example) these implications will probably go unnoticed.

For instance, consider that you need to generate 1 million random integers and store them in memory — for whatever reason. For each integer, you also need to store let’s say the “magnitude” — number of digits for this number. In most cases what you would do is have a bidimensional array with 1 million rows and 2 columns, the first column stores the number itself while the second one stores the magnitude:

int arr[][] = new int[1000000][2];

I bet you 100/1 that most of you would do that because it comes to us naturally — and it matches the mental representation: Scrabble board with 2 columns.

Now let’s remember one important aspect about arrays in Java — they are in fact objects! Does that mean anything looking back at the above code? Because in the light of that the above code reads “please create 1,000,000 objects, each one of these objects is an array with 2 integers”. Now are you getting worried about your memory consumption? Because at 1,000,000 objects, no matter how little the JVM assigns to the object “stub” code, it will still be more than an int (4 bytes). If I’m guessing say 100 bytes for the array stub code, we’re looking at 100 x 1,000,000= 100,000,000 bytes just for the arrays; then each of them will probably be allocated another 2 x 4 bytes to store the actual integers — so altogether around 108,000,000 bytes.

However, this implementation matches our natural mental represantion — we write horizontally and from left to write, so it’s only natural that we store the magnitude “on the right” of the integer value, and this means 1,000,000 rows with 2 columns. What if we turn the array around though — I know, it messes with our mental representation! — and instead we now say we will store the magnitude underneath the value (on our Scrabble board)? This now means 2 rows, each one with 1,000,000 columns — a bit arse about tit really, right? But now look at the Java code for this:

int arr[][] = new int[2][1000000];

So we now create just 2 objects (!!!) and each object is an array with 1,000,000 int! So the JVM now has to assign memory for 2 objects only and for each objects it has to assign memory for an array of integers with 1,000,000 entries. So assuming again 100 bytes for the stub code for the array object, we’re looking at 2 x 100 = 200 bytes + 2 x 8,000,000 = 16,0000,200 bytes of memory.

To sum it up — worth turning the arrays upside down every now and then if you’re dealing with large data!