StringBuilder — Memory Consumption

Posted by & filed under , .

You have seen probably my initial post about creating StringBuilder’s in Java — and the continuation of it which looked at timings involved. As I promised, I have finally had some time to look at the memory consumption involved in using the 2 ways of creating a StringBuilder in Java.

I have used a code similar to the previous one, where we employ and measure 2 methods — one that creates a StringBuilder using the default constructor, the other one which creates and initializes the StringBuilder in one go. Just as before we execute each method a number of times and at the end of it we measure the free memory left. There are however, a few things worth noticing here:

  • First of all, we execute these methods one after another — and as such, when the second method runs we would have already “wasted” some memory from running the first method so measurements of the second method would not be precise. In order to avoid this as much as possible, I have introduced a call to System.gc() after each cycle for a method call and since this only suggests the garbage collection, I have paused the main application thread for a couple of seconds, thus giving the garbage collection a chance to run. While this is not a precise method, as the GC is still not guaranteed, in most cases will do the right thing, and due to the way we will measure the results (see below) we will be able to eliminate the odd non-run of the GC.
  • Secondly, because as stated above the GC cannot be forced, it might happen that occasionally it will not run — even though we gave it a couple of seconds while the application is “asleep”. Because of these we need to run these tests multiple times and average the values. There are 2 ways one can compute an “average” value — by employing an arithmetic mean or by using a median value. I suspect most people are used to employing the arithmetic mean when talking about average — I know I was until relatively recently! 🙂 However, this has the disadvantage of very easily skewing the results if one single value of the series is way off from the other data in the series. Since this is what we’re trying to eliminate, I’ve decided to use the mean value to base my conclusions on. (However, I have included the arithmetic mean value as well for comparisons in the Excel sheet attached — in particular you will notice that for the “1mil…” tabs the 2 values differ by quite a few orders of magnitude.)
  • Also because of the way the JVM operates with memory and expands the heap as needed, the call to Runtime.freeMemory() might show wrong values as the heap expands — as such we need to prevent the heap from expanding at all, and keep it at a fixed value. This can be done via the -Xms and -Xmx parameters in the command line — simply specify the same value for the startup heap size (-Xms) and maximum heap size (-Xmx); in my tests I have settled for 512M (-Xms=512M -Xmx=512M) — and I expect this had the impact I’m going to talk about when TRIES=1,000,000 — however feel free to tweak this for your own setup.

In order to allow for better processing of the results (and graphing!) I have decided to simply dump the values to the console in CSV format so I can just copy and paste them in Excel. The code therefore looks like this now:

package liviutudor;
 
/**
 * Tests different ways of creating StringBuilder's and measures the memory consumption.
 * 
 * @author Liviu Tudor http://liviutudor.com
 */
public class StringBuilderMemoryTest {
	public static final String	CHARS 	= "0123456789012345678901234567890123456789";
	public static final int		TRIES 	= 10000;
	public static final int		RUNS	= 100;
	private static int			FLAG	= 3;	//used randomly to pretend we use the stringbuilder values
 
	public static void main(String[] args) {
		long memWP[] = new long[RUNS];
		long memWOP[]= new long[RUNS];
 
		for( int runs = 0; runs < RUNS; runs++ ) {
			for( int i = 0; i < TRIES; i++ ) {
				createNoParams();
			}
			memWP[runs] = Runtime.getRuntime().freeMemory();
 
			for( int i = 0; i < TRIES; i++ ) {
				createParams();
			}
			memWOP[runs] = Runtime.getRuntime().freeMemory();
 
			System.gc();	//try to force gc here!
			sleep();
 
			//following is just a hack to pretend we can get FLAG below zero
			FLAG = (FLAG + 1) % 2;
 
			System.out.println( String.format( "%d, %d", memWP[runs], memWOP[runs]) );
		}
	}
 
	/**
	 * Creates a stringbuilder using default constructor.
	 * Then appends chars to it.
	 * @return time it took in nanoseconds
	 */
	public static void createNoParams() {
		StringBuilder s = new StringBuilder();
		s.append( CHARS );
		//do something with this
		if( FLAG > 10 )	//this will never happen
			System.out.println( s.toString() );
	}
 
	/**
	 * Creates a stringbuilder using a String.
	 * @return time it took in nanoseconds
	 */
	public static void createParams() {
		StringBuilder s = new StringBuilder(CHARS);
		//do something with this
		if( FLAG > 10 )	//this will never happen
			System.out.println( s.toString() );
	}
 
	/** 
	 * Sleep for a few seconds (to give GC time to kick in).
	 */
	public static void sleep() {
		try {
			Thread.sleep( 2000 );
		} catch (InterruptedException e) {
			e.printStackTrace();
		}
	}
}

I am attaching the Excel sheet that I have generated using this program as well — some of you might want to try this in a different execution environment and might find this sheet useful. (If you do, I would appreciate either an email with your findings or some comments on this post.) To explain, each sheet is labelled like this “TRIES + RUNS” — so for instance “1mil + 10runs” means that the value of TRIES was 1,000,000 and the values of RUN was 10 for the results shows in that particular sheet.

All results show a common trend (which was to be expected really) : the JVM takes a hit at the first couple of runs then the JIT compiler kicks in and optimizes the execution of the program (including the memory allocation) such that the free memory is constant at the end of each run. Part of the initial different measurements is also due to the fact that the JVM is still loading classes needed for the execution, its internal buffers are not filled yet etc. I’ve included below a screenshot of the measurements for the “10k + 10runs” but as I said all the other measurements exhibit a similar trend:

Analysing the data overall a few conclusions rise:

  • First of all, in my environment, running at each iteration the function 1,000,000 times seems to kick in the GC automatically — the difference in numbers is very small compared to the other runs which employ different number of times the functions are being run for. The GC seems to occur somewhere in between running the function 100,000 times (100k) and 1,000,000 (1mil), but I didn’t spend time investigating where exactly that occurs. Despite that, I still think there is a valid point to be made about the difference in memory consumption — as per results shown by the other experiments (10k and 100k)
  • Running the code 10,000 times (which means constructing 10,000 StringBuilder objects) creates a difference in memory of nearly 3 Mb, whereas for 100,000 the difference shoots up to about 14 Mb. I suspect this also means that somewhere in between there is (at least) a GC run, since the objects created are of similar sizes and I would expect the size to grow proportionally. However, I would say 100,000 StringBuilder‘s created are less than average for a server app under normal “stress” — and with a heap size limited at 512Mb, to waste 14Mb is rather significant!
  • It seems (with the exception of 1mil runs) that running the cycle 10 times or 100 times doesn’t make much difference as the JVM settles somewhere around a constant value each time. So I suspect increasing the RUNS value to even 1,000,000 will show similar results for the same value of TRIES. This can be interpreted that with a constant number of StringBuilder‘s being created in between your GC runs, you will take a similar hit on the memory footprint, no matter how many times your GC runs. In a high throughput system you want your GC to run as few times as possible as it freezes the system, so you will be more likely to allow more time (and therefore more objects being constructed) in between GC runs.

Attached the Excel sheet here: StringBuilder Memory Consumption also the complete source code here: StringBuilderMemoryTest Java code