Sunday Monday Tuesday Wednesday Thursday Friday Saturday
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24
25 26 27 28
29 30 31 1 2 3 4

Saturday, December 22, 2012

Weak, Weaker, Weakest, Harnessing The Garbage Collector With Specialist References

When and when not to use specialist references in Java

Weak, Soft and Phantom references are dangerous and powerful. If they are used the wrong way they can destroy JVM performance; however, when used the correct way they can substantially enhance performance and program clarity.

Weak and Soft references are the more obvious of the three. They are pretty much the same thing actually! The idea is simply that they be used to access an object but will not prevent that object being reclaimed by the garbage collector:


Object y=new Object();
// y is a hard reference to the object
// and so that object cannot be reclaimed.

Obejct x=WeakReference<Object>(y);
// now x is a weak reference to the object
// (not to y - as y is just a variable).
// The object still cannot be reclaimed 
// because y is still a hard reference to it.

y=null;
// now there is only a weak reference to 
//the object, it is eligible for garbage collection.

if(x.get()==null){
    System.out.println("The object has gone away");
}else{
    System.out.println("The object is " + x.get().toString());
}

Have you spotted the deliberate mistake? It is an easy one to miss and it will probably not show up in unit testing. It is exactly the sort of issue which makes me say:

Only Use Weak/Soft References If You Absolutely Have To
And Probably Not Even Then.
When the JVM is under memory pressure it might reclaim the object between the first and second invocations of the get method in the weak reference. This will result in the program throwing a null pointer exception when the toString method is invoked on null. The correct form for the code is:

Object x=x.get();
// Now we have null xor a hard reference to
// the object
if(z==null){
    System.out.println("The object has gone away");
}else{
    System.out.println("The object is " + z.toString());
}

So they are mad, bad and dangerous to be with; why do we want them?

We have not fully touched on why they are really, really dangerous yet. To do that we need to see why we might want them and why we might need them. There are two common situations in which weak and soft references might seem like a good idea (we will look at the difference between soft and weak in a little bit). The first of these is in some form of RAM cache.

It works like this: We have some data, for example customer details, which is stored in a database. We keep looking it up and that is slow. What we can do is cache that data in RAM. However, eventually the RAM will fill up with names and addresses and the JVM throw an OutOfMemoryError. The solution is to store the names and addresses in objects which are only weakly reachable. Something like this:


ConcurrentHasMap>String,WeakReference>CustomerInfo<< cache=new ConcurrentHashMap><();
...
CustomerInfo currentCustomer=cache.get(customerName);
if(currentCustomer==null){
    currentCustomer=reloadCachesEntry(customerName);
}
This innocent little pattern is quite capable of bringing a monster sized JVM to its knees. The pattern is using the JVM's garbage collector to manage an in-memory cache. The garbage collector was never designed to do that. The pattern abuses the garbage collector by filling up the memory with weakly reachable objects which run the JVM out of heap space. When the JVM gets low in memory, it has to traverse all the reference, weak, soft and otherwise, in its heap and reclaim RAM. This is expensive and shows up as a processing cost. It is even worse on very big JVM instances with a lot of processor cores because the garbage collector may well end up having to perform a 'stop the world' full cycle and hence reduce performance down to single core levels!

I am not saying in memory cache technology is a bad idea. Oh no - it is a great idea. However, just throwing it against the garbage collector and not expecting trouble is a very poor design choice.

Weak vs Soft what is the difference? Well, there is much really. On some JVMs (the client hostspot JVM for example - but that might change at any time) weak reference are marked for preferential garbage collection. In other words, the garbage collector should make more effort to reclaim memory from the object graph to which they refer (and no soft or hard references refer) than for other memory. Soft references do not have this idea to them. However, this is just an optional idea on some JVMs and cannot be relied upon, and it is a bad idea anyway. I would suggest using either soft or weak references all the time and stick with it. Pick which ever you like the sound of. I prefer the name WeakReference, so tend to use that.

There is one other difference; an object which is referenced to by a soft reference and a weak reference, but not a hard reference, can have the situation where it can still be acquired from the .get() method of the weak reference but not that of the soft reference. The reverse is not possible not the other way around. Code that relies on this behaviour is probably wrong headed.

Good uses for weak references do exist. What weak references are great for it keeping track of objects which are being used else where. An example is from Sonic Field (an audio processing package). In this example, 'slots' in files contain audio data and are associated with objects in memory. This model does not use the weak references to refer to in-memory copies of the data. In memory objects use the slots. Weak references are used to allow the file management system to reuse slots.

The code using slots does not need (and should not need to) be concerned with the management of disk space. It is the concern of the file manager to do that. The file manager has weak references to the objects using the slots. When a new slot is requested, the file manager checks for any existing slots referred to via weak references which have been reclaimed (and hence return null from the get method). If it finds such a reference, it can reuse the slot.

Automatic notification of reclamation

Sometimes we might want to be told when a weak or soft (or the other sort - phantom) reference has been reclaimed. This can be done via the en-queuing system. We can do this using a reference queue:


WeakReference(T referent, ReferenceQueue<? super T> q)  
We do something like this:

ReferenceQueue junkQ = new ReferenceQueue<>();
....
WeakReference<FileSlot> mySlot=new WeakReference<>(aSlot);
....
// In a different thread - make sure it is daemon!
WeakReference<FileSlot> isDead;
while(true){
    isDead = junkQ.remove();
    // Take some action based on the fact it is dead
    // But - it might not be dead - see end of post :(
...
}
But, remember, by the time weak reference ends on the junkQ calling .get() on it will return null. If you will have to store information to allow what ever action you are interesting it to happen somewhere else (like a ConcurrentHashMap where the reference is the key),

So What Is A Phantom Reference?

Phantom references are the one sort which, when you need them, you really need them. But on the face of it, they seem utterly useless. You see, whenever you invoke .get() on a phantom reference, you always get null back. It is not possible to use a phantom reference to get to the object to which it refers - ever. Well - that is not quite true. We can achieve this via JNI sometimes but we should never do so.

Consider the situation where you allocate native memory in JNI associated with a Java object. This is the sort of model which the DirectBuffers in the noi package of the JDK use. It is something I have used repeatedly in large commercial projects.

So, how do we reclaim that native memory? In the case of file like systems, it is possible to say that the memory is not reclaimed until the file is closed. This places the responsibility of resource management on the shoulders of the programmer; which is exactly what the programmer expects for things like files. However, for lighter weight objects, we programmers do not like to have to think about resource management - the garbage collector is there to do it for us.

We could place code in a finalizer which calls into the JNI code to reclaim the memory. This is bad (as in lethal) because JVMs make almost guarantee that they will call finalizers. So, don't do that! But, phantom references come to the rescue! First we need to understand 'phantom reachable': A phantom reference will only become enqueued if the thing to which it refers cannot be reach via any other sort of reference (hard, weak or soft). At this point the phantom reference can be enqueued. If the object had a finalizer, then it will either have been ignored or run; but it will not have 'brought the object back to life'. Phantom reachable objects are 'safe' for JNI native code (or any other code) to reclaim resources against.

So our code with phantom references can look like this:


ReferenceQueue<FileSlot> junkQ = new ReferenceQueue<>();
....
Phantom<FileSlot> mySlot=new Phantom<>(aSlot);
....
// In a different thread - make sure it is daemon!
Phantom<FileSlot> isDead;
while(true){
    isDead=junkQ.remove();
    long handle=lookUpHandle(isDead);
    cleanNativeMemory(handle);
}
In this pattern we keep a handle which the native code can use to find and reclaim resources in a structure (another hashmap probably) in Java. When we are absolutely sure that Java object cannot be brought back to life - it is phantom reachable (i.e. a ghost - we can then safely reclaim the native resource. If your code does other 'naughty' things using sun.misc.unsafe (for example) this trick might be of use as well. For a full example which uses this technique - check out this post.

One final thought about phantom references. It is technically possible to implement the same pattern as above using weak references. However, that is not the purpose of weak references and such a pattern would be abusing them. Phantom references makes an absolute guarantee that an object really is dead and so resource can be reclaimed. For just one example, it is theoretically possible for a weak reference to be enqueued and then the object be brought back to life by its finalizer because the finalization queue is running slower than the weak reference queue. This sort of edge case horror story cannot happen with phantom references.

There is one little problem, which is a weakness of the JVM design. That is that the JNI global weak reference type has an undefined relationship with phantom references. Some people suggest that you can use a global weak reference even to get to am object even when it is enqueued as a phantom reference. This is a quirk of one particular implementation of the JVM and should never be used.

By: Dr Alexander J Turner: I would like to thank Attila for contacting me and organising this interesting project. Feel free to pop over to my blog at Nerds-Central at any time :)

Meta: this post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on! Want to write for the blog? We are looking for contributors to fill all 24 slot and would love to have your contribution! Contact Attila Balazs to contribute!

No comments:

Post a Comment