Self-healing applications: are they real?

This post is an example about an application where the first solution to each and every IT problem – “have you tried turning it off and on again” – can actually do more harm than good. Instead, we have an application that can literally heal itself: it fails at the beginning, but starts running smoothly after some time. To give an example of such application in action, we recreated it in the most simple form possible, gathering inspiration from what is now a five-year old post from the Heinz Kabutz’ Java Newsletter:

package eu.plumbr.test;

public class HealMe {
private static final int SIZE = (int) (Runtime.getRuntime().maxMemory() * 0.6);

public static void main(String[] args) throws Exception {
for (int i = 0; i < 1000; i++) {

private static void allocateMemory(int i) {
try {
byte[] bytes = new byte[SIZE];

byte[] moreBytes = new byte[SIZE];

System.out.println("I allocated memory successfully " + i);

} catch (OutOfMemoryError e) {
System.out.println("I failed to allocate memory " + i);


The code above is allocating two bulks of memory in a loop. Each of those allocation is equal to 60% of the total available heap size. As the allocations occur sequentially in the same method, then one might expect this code to keep throwing java.lang.OutOfMemoryError: Java heap space errors and never successfully complete the allocateMemory() method.

Let us start with the static analysis of the source code:

  1. From the first fast examination, this code really cannot complete, because we try to allocate more memory than is available to JVM.
  2. If we look closer we can notice that the first allocation takes place in a scoped block, meaning that the variables defined in this block are visible only to this block. This indicates that the bytes should be eligible for GC after the block is completed. And so our code should in fact run fine right from the beginning, as at the time when it tries to allocate moreBytes the previous allocation bytes should be dead.
  3. If we now look into the compiled classfile, we will see the following bytecode:

    private static void allocateMemory(int);
    0: getstatic #3 // Field SIZE:I
    3: newarray byte
    5: astore_1
    6: getstatic #4 // Field java/lang/System.out:Ljava/io/PrintStream;
    9: aload_1
    10: arraylength
    11: invokevirtual #5 // Method java/io/PrintStream.println:(I)V
    14: getstatic #3 // Field SIZE:I
    17: newarray byte
    19: astore_1
    20: getstatic #4 // Field java/lang/System.out:Ljava/io/PrintStream;
    23: aload_1
    24: arraylength
    25: invokevirtual #5 // Method java/io/PrintStream.println:(I)V
    ---- cut for brevity ----

    Here we see, that on offsets 3-5 first array is allocated and stored into local variable with index 1. Then, on offset 17 another array is going to be allocated. But first array is still referenced by local variable and so the second allocation should always fail with OOM. Bytecode interpreter just cannot let GC clean up first array, because it is still strongly referenced.

Our static code analysis has shown us that for two underlying reasons, the presented code should not run successfully and in one case, it should. Which one out of those three is the correct one? Let us actually run it and see for ourselves. It turns out that both conclusions were correct. First, application fails to allocate memory. But after some time (on my Mac OS X with Java 8 it happens at iteration #255) the allocations start succeeding:

java -Xmx2g eu.plumbr.test.HealMe
I failed to allocate memory 0
I failed to allocate memory 1

… cut for brevity ...

I failed to allocate memory 254
I failed to allocate memory 255
I allocated memory successfully 256
I allocated memory successfully 257

Self-healing code is a reality! Skynet is near…

In order to understand what is really happening we need to think, what changes during program execution? The obvious answer is, of course, Just-In-Time compilation can occur. If you recall, Just-In-Time compilation is a JVM built-in mechanics to optimize code hotspots. For this, the JIT monitors the running code and when a hotspot is detected, JIT compiles your bytecode into native code, performing different optimizations such as method inlining and dead code elimination in the process.

Lets see if this is a case by turning on the following command line options and relaunching the program

-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:+LogCompilation

This will generate a log file, in our case named hotspot_pid38139.log, where 38139 was the PID of your java process. In this file the following line can be found:

<task_queued compile_id='94' method='HealMe allocateMemory (I)V' bytes='83' count='256' iicount='256' level='3' stamp='112.305' comment='tiered' hot_count='256'/>

This means, that after executing “allocateMemory” methods 256 times, C1 compiler has decided to queue this method for C1 tier 3 compilation. You can get more information about tiered compilation’s levels and different thresholds here. And so our first 256 iterations were run in interpreted mode, where bytecode interpreter, being a simple stack machine, cannot know in advance, if some variable, bytes in this case, will be used further on or not. But JIT sees the whole method at once and so can deduce than bytes will not be used anymore and is, in fact, GC eligible. Thus the garbage collection can eventually take place and our program has magically self-healed. Now, I can only hope none of the readers should actually be responsible for debugging such a case in production. But in case you wish to make someone’s life a misery, introducing code like this to production would be a sure way to achieve this.

This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

Performance tuning – measure don’t guess

In my performance tuning career I have given the advice to measure and not guess more often than I can recall. And in many cases the target of this advice has given up after looking at the monolithic 500,000 LOC legacy application they are working.

In the post we are about to share some simple tools and concepts how to setup the initial measurement, so you can have the baseline to start with more fine-grained performance tuning.

Setting the goal

Pretty much any non-trivial application can be optimized forever. Both in bad and in good way. Bad examples include tweaking random parts of the applications and then praying for the best. Whatever might be the reason – I have seen tens and tens of developers “being pretty sure that exactly this line of code needs attention” without basing their gut feeling to any measurements.

The second category is almost as dangerous, as you can spend hundreds of man-years tuning your application “to be ready for the prime time”. The definition for prime time might vary, but are you really sure you are going to accommodate terabytes of media, tens of millions of records in your database and have to serve hundreds of thousands of concurrent users with less than 100ms latency? Unless you are aiming to put Google out of business, you most likely will not. Worse yet, spending so much time preparing for the primetime is a darn good way to assure the prime time never arrives. Instead of spending this time for squeezing out the extra 1ms in latency nobody is likely to notice, maybe the same time should have spent in ironing out this layout bug annoying the end users on Safari instead?

So how to set a meaningful target? For this you need to understand the business you are in. Real-time strategy games and First Person Shooters tend to have different requirements than online shopping carts. As last time I checked, Java was not going strong in the gaming industry, lets expect you are dealing with a typical Java EE application with web based front-end.

In this case, you should start now segmenting your application into different categories, which, based on the research are similar to the following:

  • 0.1 seconds gives the feeling of instantaneous response — that is, the outcome feels like it was caused by the user, not the computer.
  • 1 second keeps the user’s flow of thought seamless. Users can sense a delay, and thus know the computer is generating the outcome, but they still feel in control of the overall experience and that they’re moving freely rather than waiting on the computer. This degree of responsiveness is needed for good navigation.
  • 10 seconds keeps the user’s attention. From 1–10 seconds, users definitely feel at the mercy of the computer and wish it was faster, but they can handle it. After 10 seconds, they start thinking about other things, making it harder to get their brains back on track once the computer finally does respond.

So you might have functionality such as adding products to the shopping carts or browsing through the recommended items which you need to squeeze into the “instant” bucket to provide the best overall user experience resulting in better conversion rates. Initially I am pretty sure your application is nowhere near this criteria, so feel free to replace the 100ms and 1,000ms threshold with something you actually can achieve, such as 300 and 1,500ms for example.

On the other hand, you most likely have some operations which are expensive enough, such as the products search or user account registration which might fall into the second bucket.

And last, you have the functionality which you can toss into the last bucket, such as generating PDF from the invoice. Note that you also might end up with fourth category – for example if you are generating the bills of delivery for your warehouse, your business might be completely OK if batch processing the daily bills takes 20 minutes.

Pay attention that business persons have a tendency of tossing everything into the “instant” bucket. Now it is your task to explain the effort required and ask “if you had only three operations which are completing under our threshold, what should those be”.

Now you should have your application functionality categorized into different categories, such as the following:

Requirement Category
Browse products instant
Add product to a shopping cart instant
Search product seamless
Register new account attention
Generate PDF invoice attention
Generate daily bills of delivery slow

Understanding the current situation

Your next goal is to understand where you will perform your measurements. For end users, the experience is complete when the page has rendered the result of the operation in their browser, which takes into account the network latency and browser DOM operations for example.
As this will be more complex to measure, let us assume your site has been well optimized against the Google Page Speed or YSlow recommendations and for simplicity’s sake lets focus on elements directly under your control.

On most cases, the last piece of infrastructure still under your control will be your web server, such as the Apache HTTPD or nginx. If you have logging enabled, you will have access to something similar to the following in your nginx access.log: - - [04/Dec/2013:12:16:11 +0200]  "POST /register HTTP/1.1" 301 184 "" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0" 0.428 - - [04/Dec/2013:12:16:12 +0200] "POST /searchProduct HTTP/1.1" 200 35 "" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0" 3.008 - - [04/Dec/2013:12:16:12 +0200] "GET /product HTTP/1.1" 302 0 "" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0" 0.623 - - [04/Dec/2013:12:16:13 +0200] "GET /product HTTP/1.1" 200 35 "" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0" 0.828 - - [04/Dec/2013:12:16:13 +0200] "GET /product HTTP/1.1" 200 35 "" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0" 1.038 - - [04/Dec/2013:12:16:13 +0200] "GET /product HTTP/1.1" 200 35 "" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0" 0.627 - - [04/Dec/2013:12:16:14 +0200] "GET /addToShoppingCart HTTP/1.1" 200 35 "" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0" 2.808 - - [04/Dec/2013:12:16:14 +0200] "GET /purchase HTTP/1.1" 302 0 "" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0" 3.204 - - [04/Dec/2013:12:16:16 +0200] "GET /viewPDFInvoice HTTP/1.1" 200 11562 "" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0" 3.018

Lets investigate now what we have found in the log. The snippet we have extracted contains actions of one user completing a full transaction in our web site. The user has created an account, searched for products, browsed through the results, found a product he has liked, added it to a shopping cart, completed the purchase and generated a PDF of the invoice. Those actions can be detected in the “POST /searchProduct HTTP/1.1” column. Next important part is the last column containing the total request time it took for a particular request to complete. In the case of /searchProduct it took 3.008 seconds to complete the search.

Note that by default nginx does not have the request time logging enabled, so you might need to to modify your log_format by adding $request_time to the pattern.

Now, with a little bit of grep/sed/excel magic, you will have something similar to the following at your fingertips:

Requirement Category Mean 90%
Browse products instant 0.734 0.902
Add product to a shopping cart instant 2.422 3.490
Search product seamless 2.800 3.211
Register new account attention 0.428 0.480
Generate PDF invoice attention 3.441 4.595
Generate daily bills of delivery slow

Picking the target for optimization

From here you will get your list of optimization targets. The suspects are obvious – they fall in the categories where you violate the agreement, in the above case you have problems with three requirements – Browse products, Add product to a shopping cart and Search product all violate your performance requirements.

This article does not focus on actual optimisation tasks, but in order to give food for thought – what should you do next, knowing that adding products to the shopping cart takes way too much time?

Next steps are definitely more application specific than the prior tasks. Also, the tooling to be used can vary – you might now decide to go with APM or monitoring products. But if you do not have the budget or just do not wish to fight with procurement office, a simple solution would include adding logging trail in the component boundaries by using AOP means.

For example, you can add something like the following in your application code to monitor the time spent in the service layer.

public class TimingAspect {

@Around("execution(* com.myshop..*Service+.*(..))")
public Object time(ProceedingJoinPoint joinPoint) throws Throwable {
StopWatch.WatchStatus watch = StopWatch.getInstance().start(joinPoint.getSignature().getName());
final Object result = joinPoint.proceed();
return result;

Decomposing the request time further gives you the required insight to understand where the bottlenecks are hiding. More often than not you will quickly discover a violating query, faulty cache configuration or poorly decomposed functionality enabling you to net your first quick wins in a matter of days or even hours.

This post is written by @iNikem from Plumbr and is part of the Java Advent Calendar. The post is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!