Cache Ahead using Apache Commons Pool

Posted by & filed under , .

If you have done any code that needs some sort of pooling of resources (which is some sort of caching, let’s face it), you would have no doubt come across Apache Commons Pool. (In fact the DBCP pool is used as a standard in applications which require database connection pooling.) The framework offers most of the things needed to pool objects of any kind and provide the management required around pooling.

I’m going to look in this post at how to use this framework to create a “cache ahead” pool — a cache/pool where a certain number of instances are created ahead and are ready to use when requested. This might sound odd, since the idea of pooling means that once an object is created is returned back to the pool and ready to use again — however, you will find out that the pool classes don’t pre-populate the pool, that’s one thing and secondly, there are cases where objects created store stateful information and as such cannot be returned to the pool and recycled. In such cases, you really have to continuously create new instances on demand — and the pool classes are not designed for that out-of-the-box.

To put some context around the problem — part of the project i’ve been working on recently requires some classes to be instantiated on demand. These classes implement a certain interface, so they follow a certain contract, and they are meant to allow customers to implement “hooks” in certain parts of the application, so they can customize parts of the “normal” run of the application to their needs. (Think of these as scripts which get called at certain execution stages during the normal run of the application.) In this case, I thought straight away of using a pool of such instances — since the application is of a high-throughput manner, and as such object churning and GC pressure can affect performance. Also, because of this, ideally I need the pool to be pre-populated with N instances so they are already ready when the program requires them.

Based on the above, I thought straight away of using a KeyedObjectPool-based implementation, where the actuall class would be the key and each key would pool a set of instances of this Class object. Using something like StackKeyedObjectPool would even allow me to pre-populate the pool — however, the big problem is that I cannot make assumptions about whether these instances are “recyclable” (that is, I can’t tell if they can be passivated and activated again or whether they need to be destroyed after each call)! As such, I might pre-populate initially the cache with N instances, but if these are not recyclable, after the Nth call my pool would be empty again — and each request will take an extra hit for the object creation! Now, as I said, I’m trying to lower the object creation/destroy memory footprint — but even if that’s not possible all the time, I still need to decrease the time it takes to use these objects in my program, so eliminating the creation time helps greatly! (Agreed, if I have to create an object for each request, this is hardly a “pool”, however, using the Commons Pool components, I can at least eliminate the creation procedure from a normal execution and have some instances prepared for me, ahead of the execution starting — not a true pool/cache implementation, but it has similarities.)

So, since as I said, in my case, I simply need to create instances of certain classes, I can’t tell upfront how the hierarchy of classes will evolve; as such, my KeyedPoolableObjectFactory cannot “know” upfront how to activate or passivate these instance, nor will it know whether a certain object is valid or not. To overcome this, I’ve implemented a very basic interface: CacheableAhead — this offers only the basic methods to allow the KeyedPoolableObjectFactory to delegate the activate/passivate/isValid calls to the object itself: rather than the factory itself deciding about the object, it will simply invoke one of these methods and the object itself will decide whether it is valid or not, it will perform the required actions to activate and passivate itself:

public interface CacheableAhead {
/**
* Resets the object to the "initial" state. That is the state the object
* was in right after it was created (via <code>new</code>).
*/
void activate();
 
/**
* Prepares the object to be returned to the pool. Needs to free up any
* resources at any point. Similar to {@link #destroy()} but potentially
* less "destructive".
*/
void passivate();
 
/**
* "Destroys" the object. That is, frees up any resources that the object is
* taking and prepares it to be released into the great void of garbage
* collection :D
*/
void destroy();
 
/**
* Used by the caches/pools to verify whether this object is valid to be
* re-pooled or it needs to be destroyed and chucked away.
*
* @return true if the object is valid and can be returned to the pool
*         (passivated first) or false if this object needs to be thrown
*         away (after it's been destroyed)
*/
boolean isValid();
}

With this in mind, we can now look at implementing our KeyedPoolableObjectFactory, which, as I said, will delegate all calls to this interface:

/**
* Factory class which delegates all the methods to {@link CacheableAhead} for
* creating instances of classes.
*
* @author Liviu Tudor http://about.me/liviutudor
* @param <V>
*            Type of class it operates on.
*/
public class CreateInstanceFactory<V extends CacheableAhead> implements KeyedPoolableObjectFactory<Class<V>, V> {
@Override
public V makeObject(Class<V> key) throws InstantiationException, IllegalAccessException {
return key.newInstance();
}
 
@Override
public void destroyObject(Class<V> key, V obj) throws Exception {
obj.destroy();
}
 
@Override
public boolean validateObject(Class<V> key, V obj) {
return obj.isValid();
}
 
@Override
public void activateObject(Class<V> key, V obj) throws Exception {
obj.activate();
}
 
@Override
public void passivateObject(Class<V> key, V obj) throws Exception {
obj.passivate();
}
}

So far so good, we got our basics sorted so now it’s the time to implement our pool/cache which creates instances ahead. The idea behind it is simple: we will have a scheduled thread which at standard (configurable) intervals would wake up, check all the keys we have in the cache/pool, and for each key ensure we have at least N instances in the pool — and where necessary create more instances and store them in the pool, ready for when they are requested. So far, nothing tricky — a simple ScheduledExecutorService which will just traverse the keys and simply invoke preparePool() for each key, and voila! we have our instances created.

Here’s the problem though: none of the KeyedObjectPool-based classes in Apache Commons Pool allow access to the keys of the pool! So, because of this, we need to keep track of all the keys ourselves, in our class! This can be done by building a set of keys which gets modified when borrowObject is called — since we are requested a new instance, and we can add the class to the set of keys — or when clear is called (since we are requested to get rid of some instances — and the key). This means we have to override these classes and keep updating internally a set of keys — which we will traverse when the background thread kicks in.

Based on the above, our pool-based class which caches ahead instances looks like this:

/**
* This is a type of cache which creates ahead a certain number of instances for
* a given key in order to prevent creation on demand and also has a background
* thread which ensure there are at least a certain number of pre-created items
* in the pool when it runs (and creates more as necessary). Since we don't know
* upfront the keys to this pool, the pool will start empty, however, once an
* object is requested for a certain key, soon after, the background thread will
* kick in and pre-populate the pool for that key. Users can also force
* pre-population by calling <code>preparePool(key,true)</code>.
*
* @param T
*            Object type produced (and pooled) by this cache/pool
* @author Liviu Tudor http://about.me/liviutudor
*/
public class CreateInstanceAheadCache<T extends CacheableAhead> extends GenericKeyedObjectPool<Class<T>, T> {
/** Debug logging. */
private static final Logger      LOG = LoggerFactory.getLogger(CreateInstanceAheadCache.class);
/**
* This is the background thread which is responsible for creating instances
* when needed.
*/
private ScheduledExecutorService backgroundTask;
 
/**
* Stores all the keys that were requested via <code>borrowObject</code>,
* <code>preparePool</code>, <code>clear(key)</code>. This is being
* traversed in {@link #prepopulate()} to prepopulate the pool.
*/
private Set<Class<T>>            keys;
 
/**
* Creates a cache which will use the given factory to create new instances
* and will maintain a certain number of idle instances in the cache. It
* also initiates a background thread which will "wake up" regularly and
* ensure there are at least certain number of instances for each key in the
* pool.
*
* @param factory
*            Factory used to create new instances in the pool
* @param minIdle
*            Minimum instances to create in the pool for a key
* @param wakeUpInterval
*            Interval in miliseconds for the background thread to wakeup
*            and perform creation of additional instances if needed
*/
public CreateInstanceAheadCache(KeyedPoolableObjectFactory<Class<T>, T> factory, int minIdle,
long wakeUpInterval) {
super(factory);
LOG.debug("Creating cache ahead with minIdle={}, wakeUpInterval={}", minIdle, wakeUpInterval);
if (minIdle <= 0) {
throw new IllegalArgumentException("minIdle (" + minIdle + ") cannot be <= 0 !");
}
setMinIdle(minIdle);
if (wakeUpInterval <= 0) {
throw new IllegalArgumentException("wakeUpInterval (" + wakeUpInterval + ") cannot be <= 0 !");
}
this.keys = Collections.synchronizedSet(new HashSet<Class<T>>());
LOG.debug("Created keySet {}", this.keys);
 
LOG.debug("Parameters correct, creating background thread");
backgroundTask = Executors.newSingleThreadScheduledExecutor();
LOG.debug("Schedulling background thread");
backgroundTask.scheduleAtFixedRate(new Runnable() {
@Override
public void run() {
LOG.debug("Running prepopulate on timer");
prepopulate();
}
}, wakeUpInterval, wakeUpInterval, TimeUnit.MILLISECONDS);
LOG.debug("Cache ahead created.");
}
 
@Override
public final void close() throws Exception {
LOG.debug("Closing parent cache");
super.close();
LOG.debug("Emptying key set");
keys.clear();
LOG.debug("Shutting down thread");
backgroundTask.shutdownNow();
LOG.debug("Calling close hook");
closeHook();
LOG.debug("Cache closed");
}
 
/**
* Used as a "hook" for subclasses to get notifications when this cache is
* being closed. Called by {@link #close()} right at the end.
*
* @throws Exception
*             if any errors occur during releasing resources etc.
*/
protected void closeHook() throws Exception {
}
 
@Override
public final T borrowObject(Class<T> key) throws Exception {
LOG.debug("Borrowing object for {}", key);
if (!keys.contains(key)) {
keys.add(key);
}
LOG.debug("Set of keys so far {}", keys);
borrowObjectHook(key);
LOG.debug("Called borrowObjectHook, now forwarding call to superclass");
return super.borrowObject(key);
}
 
/**
* Used by {@link #borrowObject(Class)} as a "hook" to notify subclasses of
* the call received.
*
* @param key
*            Key that was just used in the call to
*            {@link #borrowObject(Class)}.
* @throws Exception
*             Thrown to prevent the call to
*             <code>super.borrowObject(key)</code> from completing.
*/
protected void borrowObjectHook(Class<T> key) throws Exception {
}
 
@Override
public final void clear(Class<T> key) {
LOG.debug("Clearing key {}", key);
keys.remove(key);
LOG.debug("Set of keys so far {}", keys);
clearHook(key);
LOG.debug("Called clearHook, now forwarding call to superclass");
super.clear(key);
}
 
/**
* Used by {@link #clear(Class)} as a "hook" to notify subclasses of the
* call received.
*
* @param key
*            Key received in the call to {@link #clear(Class)} for which
*            we're clearing the cache
*/
protected void clearHook(Class<T> key) {
}
 
@Override
public final void clear() {
LOG.debug("Clearing whole cache.");
keys.clear();
LOG.debug("Set of keys after clear: {}", keys);
clearHook();
LOG.debug("Called clearHook(void), now forwarding call to superclass");
super.clear();
}
 
/**
* Used by {@link #clear()} as a "hook" to notify subclasses of the calls
* received.
*/
protected void clearHook() {
}
 
@Override
public final synchronized void preparePool(Class<T> key, boolean populateImmediately) {
if (!keys.contains(key)) {
keys.add(key);
}
super.preparePool(key, populateImmediately);
}
 
/**
* Used internally to go through all the keys in the cache and
*/
protected final void prepopulate() {
LOG.debug("Running prepopulate");
for (Class<T> t : keys) {
LOG.debug("Prepopulating {}", t);
prepopulateHook(t);
LOG.debug("Called hook, prepopulating");
preparePool(t, true);
}
LOG.debug("Finished prepopulating");
}
 
/**
* Used as a hook when {@link #prepopulate()} kicks in, for each key, it
* will call this function and pass in this key prior to creating the
* required number of instances.
*
* @param key
*            Key for which we are about to prepopulate the cache.
*/
protected void prepopulateHook(Class<T> key) {
}
}

This now allows us to create instances ahead — which as I said helps in a high-throughput system — and also pool them if we need to be. To make this class work best though, you will need to tweak the number of instances to be created ahead and the wake up interval for the background thread — ideally, I guess, I should provide some JMX measurements around this so you can get an idea how many objects you are creating a second etc. I might plug that in one day but for my case, it turns out simply having a large number of items upfront (e.g. 500) is enough with a timer that kicks in once a second, so play with these values and see what works best for your case.

Attached is the project with the sources and unit tests for the classes above. Please note that this uses Commons Pool 1.6.0 and JUnit 4.8. (plus SLF4J and a few other “standard” libraries): cacheahead.tar.bz2