Adopt OpenJDK & Java community: how can you help Java !


I want to take the opportunity to show what we have been doing in last year and also what we have done so far as members of the community. Unlike other years I have decided to keep this post less technical compare to the past years and compared to the other posts on Java Advent this year.


This year marks the fourth year since the first OpenJDK hackday was held in London (supported by LJC and its members) and also when the Adopt OpenJDK program was started. Four years is a small number on the face of 20 years of Java, same goes to the size of the Adopt OpenJDK community which forms a small part of the Java community (9+ million users). Although the post is non-technical in nature, the message herein is fairly important for the future growth and progress of our community and the next generation developers.

Creations of the community

Creations from the community

Over the many months a number of members of our community contributed and passed on their good work to us. In no specific order I have enlisted these picking them from memory. I know there are more to name and you can help us by sharing those with us (we will enlist them here).  So here are some of those that we can talk about and be proud of, and thank those who were involved:

  • Getting Started page – created to enabled two way communication with the members of the community, these include a mailing list, an IRC channel, a weekly newsletter, a twitter handle, among other social media channels and collaboration tools.
  • Adopt OpenJDK project: jitwatch – a great tool created by Chris Newland, its one of its kind, ever growing with features and helping developers fine-tune the performance of your Java/JVM applications running on the JVM.
  • Adopt OpenJDK: GSK – a community effort gathering knowledge and experience from hackday attendees and OpenJDK developers on how to go about with OpenJDK from building it to creating your own version of the JDK. Many JUG members have been involved in the process, and this is now a e-book available in many languages (5 languages + 2 to 3 more languages in progress).
  • Adopt OpenJDK vagrant scripts – a collection of vagrant scripts initially created by John Patrick from the LJC, later improved by the community members by adding more scripts and refactoring existing ones. Theses scripts help build OpenJDK projects in a virtualised container i.e. VirtualBox, making building, and testing OpenJDK and also running and testing Java/JVM applications much easier, reliable and in an isolated environment.
  • Adopt OpenJDK docker scripts – a collection of docker scripts created with the help of the community, this is now also receiving contributions from a number of members like Richard Kolb (SA JUG). Just like the vagrant scripts mentioned above, the docker scripts have similar goals, and need your DevOps foo!
  • Adopt OpenJDK project: mjprof – mjprof is a Monadic jstack analysis tool set. It is a fancy way to say it analyzes jstack output using a series of simple composable building blocks (monads). Many thanks to Haim Yadid for donating it to the community.
  • Adopt OpenJDK project: jcountdown – built by the community that mimics the spirit of That is, to encourage users to move to the latest and greatest Java! Many thanks to all those involved, you can already see from the commit history.
  • Adopt OpenJDK CloudBees Build Farm – thanks to the folks at CloudBees for helping us host our build farm on their CI/CD servers. This one was initially started by Martijn Verburg and later with the help of a number of JUG members have come to the point that major Java projects are built against different versions of the JDK. These projects include building the JDKs themselves (versions 1.7, 1.8, 1.9, Jigsaw and Shenandoah). This project has also helped support the Testing Java Early project and Quality  Outreach program.

These are just a handful of such creations and contributions from the members of the community, some of these projects would certainly need help from you. As a community one more thing we could do well is celebrate our victories and successes, and especially credit those that have been involved whether as individuals or a community. So that our next generation contributors feel inspired and encourage to do more good work and share it with us.

Contributions from the community

We want to contribute

In a recent tweet and posts to various Java / JVM and developer mailing lists, I requested the community to come forward and share their contribution stories or those from others with our community. The purpose was two-fold, one to share it with the community and the other to write this post (which in turn is shared with the community). I was happy to see a handful of messages sent to me and the mailing lists by a number of community members. I’ll share some of these with you (in the order I have received them).

Sebastian Daschner:

I don’t know if that counts as contribution but I’ve hacked on the
OpenJDK compiler for fun several times. For example I added a new
thought up ‘maybe’ keyword which produces randomly executed code:

Thomas Modeneis:

Thanks for writing, I like your initiative, its really good to show how people are doing and what they have been focusing on. Great idea.
From my part, I can tell about the DevoxxMA last month, I did a talk on the Hacker Space about the Adopt the OpenJDK and it was really great. We had about 30 or more attendees, it was in a open space so everyone that was going to any talk was passing and being grabbed to have a look about the topic, it was really challenging because I had no mic. but I managed to speak out loud and be listen, and I got great feedback after the session. I’m going to work over the weekend to upload the presentation and the recorded video and I will be posting here as soon as I have it done! 🙂

Martijn Verburg:

Good initiative.  So the major items I participated in were Date and Time and Lambdas Hackdays (reporting several bugs), submitted some warnings cleanups for OpenJDK.  Gave ~10 pages of feedback for jshell and generally tried to encourage people more capable than me to contribute :-).

Andrii Rodionov:

Olena Syrota and Oleg Tsal-Tsalko from Ukraine JUG: Contributing to JSR 367 test code-base (, promoting ‘Adopt a JSR’ and JSON-B spec at JUG UA meetings ( and also at JavaDay Lviv conference (


Contributors gathering together

As you have seen that from out of a community of 9+ million users, only a handful of them came forward to share their stories. While I can point you out to another list of contributors who have been paramount with their contributions to the Adopt OpenJDK GitBook, for example, take a look at the list of contributors and also the committers on the git-repo. They have not just contributed to the book but to Java and the OpenJDK community, especially those who have helped translate the book into multiple languages. And then there are a number of them who haven’t come forward to add their names to the list, even though they have made valuable contributions.
Super heroes together

From this I can say contributors can be like unsung heroes, either due their shy or low-profile nature or they just don’t get noticed by us. So it would only be fair to encourage them to come forward or share with the community about their contributions, however simple or small those may be. In addition to the above list I would like to also add a number of them (again apologies if I have missed out your name or not mentioned about you or all your contributions). These names are in no particular order but as they come to my mind as their contributions have been invaluable:

  • Dalibor Topic (OpenJDK Project Lead) & the OpenJDK team
  • Mario Torre & the RedHat OpenJDK team
  • Tori Wieldt (Java Community manager) and her team
  • Heather Vancura & the JCP team
  • NightHacking, vJUG and RebelLabs (and the great people behind them)
  • Nicolaas & the team at Cloudbees
  • Chris Newland (JitWatch developer)
  • Lucy Carey, Ellie & Mark Hazell (Devoxx UK & Voxxed)
  • Richard Kolb (JUG South Africa)
  • Daniel Bryant, Richard Warburton, Ben Evans, and a number of others from LJC
  • Members of SouJava (Otavio, Thomas, Bruno, and others)
  • Members of Bulgarian JUG (Ivan, Martin, Mitri) and neighbours
  • Oti, Ludovic & Patrick Reinhart
  • and a number of other contributors who for some reason I can’t remember…

I have named them for their contributions to the community by helping organise Hackdays during the week and weekends, workshops and hands-on sessions at conferences, giving lightening talks, speaking at conferences, allowing us to host our CI and build farm servers, travelling to different parts of the world holding the Java community flag, writing books, giving Java and advance-level training, giving feedback on new technologies and features, and innumerable other activities that support and push forward the Java / JVM platform.

How you can make a difference ? And why ?

Make a difference

You can make a difference by doing something as simple as clicking the like button (on Twitter, LinkedIn, Facebook, etc…) or responding to a message on a mailing list by expressing your opinion about something you see or read about –as to why you think about it that way or how it could be different.

The answer to the question “And why ?” is simple, because you are part of a community and ‘you care’ and want to share your knowledge and experience with others — just like the others above who have spared free moments of their valuable time for us.

Is it hard to do it ? Where to start ? What needs most attention ?

important-checklist The answer is its not hard to do it, if so many have done it, you can do it as well. Where to start and what can you do ? I have written a page on this topic. And its worth reading it before going any further.

There is a dynamic list of topics that is worth considering when thinking of contributing to OpenJDK and Java. But recently I have filtered this list down to a few topics (in order of precedence):

We need you!

With that I would like to close by saying:


Not just “I”, but we as a community need you.

This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

JIT Compiler, Inlining and Escape Analysis

Just-in-time (JIT)

Just-in-time (JIT) compiler is the brain of the Java Virtual Machine. Nothing in the JVM affects performance more than the JIT compiler.

For a moment let’s step back and see examples of compiled and non compiled languages.

Languages like Go, C and C++ are called compiled languages because their programs are distributed as binary (compiled) code, which is targeted to a particular CPU.

On the other hand languages like PHP and Perl, are interpreted. The same program source code can be run on any CPU as long as the machine has the interpreter. The interpreter translates each line of the program into binary code as that line is executed.

Java attempts to find a middle ground here. Java applications are compiled, but instead of being compiled into a specific binary for a specific CPU, they are compiled into a bytecode. This gives Java the platform independence of an interpreted language. But Java doesn’t stop here.

In a typical program, only a small sections of the code is executed frequently, and the performance of an application depends primarily on how fast those sections of code are executed. These critical sections are known as the hot spots of the application.
The more times JVM executes a particular code section, the more information it has about it. This allows the JVM to make smart/optimized decisions and compile small hot code into a CPU specific binary. This process is called Just in time compilation (JIT).

Now let’s run a small program and observe JIT compilation.

public class App {
  public static void main(String[] args) {
    long sumOfEvens = 0;
    for(int i = 0; i < 100000; i++) {
      if(isEven(i)) {
        sumOfEvens += i;

  public static boolean isEven(int number) {
    return number % 2 == 0;

#### Run
javac && \
java -server \
     -XX:-TieredCompilation \
     -XX:+PrintCompilation \
              - XX:CompileThreshold=100000 App

#### Output
87    1             App::isEven (16 bytes)

Output tells us that isEven method is compiled. I intentionally disabled TieredCompilation to get only the most frequently compiled code.

JIT compiled code will give a great performance boost to your application. Want to check it ? Write a simple benchmark code.


Inlining is one of the most important optimizations that JIT compiler makes. Inlining replaces a method call with the body of the method to avoid the overhead of method invocation.

Let’s run the same program again and this time observe inlining.

#### Run
javac && \
java -server \
     -XX:+UnlockDiagnosticVMOptions \
     -XX:+PrintInlining \
     -XX:-TieredCompilation App

#### Output
@ 12   App::isEven (16 bytes)   inline (hot)

Inlining again will give a great performance boost to your application.

Escape Analysis

Escape analysis is a technique by which the JIT Compiler can analyze the scope of a new object’s uses and decide whether to allocate it on the Java heap or (Wrong: on the method stack) [Update] handle object members directly (scalar replacement)[/Update]. It also eliminates locks for all non-globally escaping objects

Let’s run a small program and observe garbage collection.

public class App {
  public static void main(String[] args) {
    long sumOfArea = 0;
    for(int i = 0; i < 10000000; i++) {
      Rectangle rect = new Rectangle(i+5, i+10);
      sumOfArea += rect.getArea();

  static class Rectangle {
    private int height;
    private int width;

    public Rectangle(int height, int width) {
      this.height = height;
      this.width = width;

    public int getArea() {
      return height * width;

In this example Rectangle objects are created and available only within a loop, they are characterised as NoEscape and can handle object members directly (scalar replacement) instead of allocating objects in heap. Specifically, this means that no garbage collection will happen.

Let’s run the program without EscapeAnalysis.

#### Run
javac && \
java -server \
     -verbose:gc \
     -XX:-DoEscapeAnalysis App

#### Output
[GC (Allocation Failure)  65536K->472K(251392K), 0.0007449 secs]
[GC (Allocation Failure)  66008K->440K(251392K), 0.0008727 secs]
[GC (Allocation Failure)  65976K->424K(251392K), 0.0005484 secs]

As you can see GC kicked-in. Allocation Failure means no more space is left in young generation to allocate objects. So, it is normal cause of young GC.

This time let’s run it with EscapeAnalysis.

#### Run
javac && \
java -server \
    -verbose:gc \
    -XX:+DoEscapeAnalysis App

#### Output

No GC happened this time. Which basically means creating short lived and narrow scoped objects is not necessarily introducing garbage.

DoEscapeAnalysis option is enabled by default. Note that only Java HotSpot Server VM supports this option.

As a consequence, we all should avoid premature optimization, focus on writing more readable/maintainable code and let JVM do it’s job.

Functional vs Imperative Programming. Fibonacci, Prime and Factorial in Java 8

There are multiple programming styles/paradigms, but two well-known ones are Imperative and Functional.

Imperative programming is the most dominant paradigm as nearly all mainstream languages (C++, Java, C#) have been promoting it. But in the last few years functional programming started to gain attention. One of the main driving factors is that simply all new computers are shipped with 4, 8, 16 or more cores and it’s very difficult to write a parallel program in imperative style to utilize all cores. Functional style moves this difficultness to the runtime level and frees developers from hard and error-prone work.

Wait! So what’s the difference between these two styles.

Imperative programming is a paradigm where you tell how exactly and which exact statements machine/runtime should execute to achieve desired result.

Functional programming is a form of declarative programming paradigm where you tell what you would like to achieve and machine/runtime determines the best way how to do it.

Functional style moves the how part to the runtime level and helps developers focus on the what part. By abstracting the how part we can write more maintainable and scalable software.

To handle the challenges introduced by multicore machines and to remain attractive for developers Java 8 introduced functional paradigm next to imperative one.

Enough theory, let’s implement few programming challenges in Imperative and Functional style using Java and see the difference.

Fibonacci Sequence Imperative vs Functional (The Fibonacci Sequence is the series of numbers: 1, 1, 2, 3, 5, 8, 13, 21, 34, … The next number is found by adding up the two numbers before it.)

Fibonacci Sequence in iterative and imperative style

public static int fibonacci(int number) {
  int fib1 = 1;
  int fib2 = 1;
  int fibonacci = fib1;
  for (int i = 2; i < number; i++) {
    fibonacci = fib1 + fib2;
    fib1 = fib2;
    fib2 = fibonacci;
  return fibonacci;

for(int i = 1; i  <= 10; i++) {
  System.out.print(fibonacci(i) +" ");
// Output: 1 1 2 3 5 8 13 21 34 55 

As you can see here we are focusing a lot on how (iteration, state) rather that what we want to achieve.

Fibonacci Sequence in iterative and functional style

IntStream fibonacciStream = Stream.iterate(
    new int[]{1, 1},
    fib -> new int[] {fib[1], fib[0] + fib[1]}
  ).mapToInt(fib -> fib[0]);

fibonacciStream.limit(10).forEach(fib ->  
    System.out.print(fib + " "));
// Output: 1 1 2 3 5 8 13 21 34 55 

In contrast, you can see here we are focusing on what we want to achieve.

Prime Numbers Imperative vs Functional (A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself.)

Prime Number in imperative style

public boolean isPrime(long number) {  
  for(long i = 2; i <= Math.sqrt(number); i++) {  
    if(number % i == 0) return false;  
  return number > 1;  
isPrime(9220000000000000039L) // Output: true

Again here we are focusing a lot on how (iteration, state).

Prime Number in functional style

public boolean isPrime(long number) {  
  return number > 1 &&  
     .rangeClosed(2, (long) Math.sqrt(number))  
     .noneMatch(index -> number % index == 0);
isPrime(9220000000000000039L) // Output: true

Here again we are focusing on what we want to achieve. The functional style helped us to abstract away the process of explicitly iterating over the range of numbers.

You might now think, hmmm, is this all we can have …. ? Let’s see how can we use all our cores (gain parallelism) in functional style.

public boolean isPrime(long number) {  
  return number > 1 &&  
    .rangeClosed(2, (long) Math.sqrt(number))
    .noneMatch(index -> number % index == 0);
isPrime(9220000000000000039L) // Output: true

That’s it! We just added .parallel() to the stream. You can see how library/runtime handles complexity for us.

Factorial Imperative vs Functional ( The factorial of n is the product of all positive integers less than or equal to n.)

Factorial in iterative and imperative style

public long factorial(int n) {
  long product = 1;
  for ( int i = 1; i <= n; i++ ) {
    product *= i;
  return product;
factorial(5) // Output: 120

Factorial in iterative and functional style

public long factorial(int n) {
 return LongStream
   .rangeClosed(1, n)
   .reduce((a, b) -> a *   b)
factorial(5) // Output: 120

It’s worth repeating that by abstracting the how part we can write more maintainable and scalable software.

To see all the functional goodies introduced by Java 8 check out the following Lambda Expressions, Method References and Streams guide. Continue reading Functional vs Imperative Programming. Fibonacci, Prime and Factorial in Java 8

How is Java / JVM built ? Adopt OpenJDK is your answer!

Introduction & history
As some of you may already know, starting with Java 7, OpenJDK is the Reference Implementation (RI) to Java. The below time line gives you an idea about the history of OpenJDK:
OpenJDK history (2006 till date)
If you have wondered about the JDK or JRE binaries that you download from vendors like Oracle, Red Hat, etcetera, then the clue is that these all stem from OpenJDK. Each vendor then adds some extra artefacts that are not open source yet due to security, proprietary or other reasons.

What is OpenJDK made of ?
OpenJDK is made up of a number of repositories, namely corba, hotspot, jaxp, jaxws, jdk, langtools, and nashorn. Between OpenjJDK8 and OpenJDK9 there have been no new repositories introduced, but lots of new changes and restructuring, primarily due to Jigsaw – the modularisation of Java itself [2] [3] [4] [5].
repo composition, language breakdown (metrics are estimated)
Recent history
OpenJDK Build Benchmarks – build-infra (Nov 2011) by Fredrik Öhrström, ex-Oracle, OpenJDK hero!

Fredrik Öhrström visited the LJC [16] in November 2011 where he showed us how to build OpenJDK on the three major platforms, and also distributed a four page leaflet with the benchmarks of the various components and how long they took to build. The new build system and the new makefiles are a result  of the build system being re-written (build-infra). 

Below are screen-shots of the leaflets, a good reference to compare our journey:

Build Benchmarks page 2 [26]

How has Java the language and platform built over the years ?

Java is built by bootstrapping an older (previous) version of Java – i.e. Java is built using Java itself as its building block. Where older components are put together to create a new component which in the next phase becomes the building block. A good example of bootstrapping can be found at Scheme from Scratch [6] or even on Wikipedia [7].

OpenJDK8 [8] is compiled and built using JDK7, similarly OpenJDK9 [9] is compiled and built using JDK8. In theory OpenJDK8 can be compiled using the images created from OpenJDK8, similarly for OpenJDK9 using OpenJDK9. Using a process called bootcycle images – a JDK image of OpenJDK is created and then using the same image, OpenJDK is compiled again, which can be accomplished using a make command option:

$ make bootcycle-images       # Build images twice, second time with newly built JDK

make offers a number of options under OpenJDK8 and OpenJDK9, you can build individual components or modules by naming them, i.e.

$ make [component-name] | [module-name]

or even run multiple build processes in parallel, i.e.

$ make JOBS=<n>                 # Run <n> parallel make jobs

Finally install the built artefact using the install option, i.e.

$ make install

Some myths busted
OpenJDK or Hotspot to be more specific isn’t completely written in C/C++, a good part of the code-base is good ‘ole Java (see the composition figure above). So you don’t have to be a hard-core developer to contribute to OpenJDK. Even the underlying C/C++ code code-base isn’t scary or daunting to look at. For example here is an extract of a code snippet from vm/memory/universe.cpp in the HotSpot repo –

if (UseParallelGC) {
   #ifndef SERIALGC
   Universe::_collectedHeap = new ParallelScavengeHeap();
   #else // SERIALGC
       fatal(UseParallelGC not supported in this VM.);
   #endif // SERIALGC

} else if (UseG1GC) {
   #ifndef SERIALGC
   G1CollectorPolicy* g1p = new G1CollectorPolicy();
   G1CollectedHeap* g1h = new G1CollectedHeap(g1p);
   Universe::_collectedHeap = g1h;
   #else // SERIALGC
       fatal(UseG1GC not supported in java kernel vm.);
   #endif // SERIALGC

} else {
   GenCollectorPolicy* gc_policy;

   if (UseSerialGC) {
       gc_policy = new MarkSweepPolicy();
   } else if (UseConcMarkSweepGC) {
       #ifndef SERIALGC
       if (UseAdaptiveSizePolicy) {
           gc_policy = new ASConcurrentMarkSweepPolicy();
       } else {
           gc_policy = new ConcurrentMarkSweepPolicy();
       #else // SERIALGC
           fatal(UseConcMarkSweepGC not supported in this VM.);
       #endif // SERIALGC
   } else { // default old generation
       gc_policy = new MarkSweepPolicy();

   Universe::_collectedHeap = new GenCollectedHeap(gc_policy);
(please note that the above code snippet might have changed since published here)
The things that appears clear from the above code-block are, we are looking at how pre-compiler notations are used to create Hotspot code that supports a certain type of GC i.e. Serial GC or Parallel GC. Also the type of GC policy is selected in the above code-block when one or more GC switches are toggled i.e. UseAdaptiveSizePolicy when enabled selects the Asynchronous Concurrent Mark and Sweep policy. In case of either Use Serial GC or Use Concurrent Mark Sweep GC are not selected, then the GC policy selected is Mark and Sweep policy. All of this and more is pretty clearly readable and verbose, and not just nicely formatted code that reads like English.

Further commentary can be found in the section called Deep dive Hotspot stuff in the Adopt OpenJDK Intermediate & Advance experiences [12] document.

Steps to build your own JDK or JRE
Earlier we mentioned about JDK and JRE images – these are no longer only available to the big players in the Java world, you and I can build such images very easily. The steps for the process have been simplified, and for a quick start see the Adopt OpenJDK Getting Started Kit [11] and Adopt OpenJDK Intermediate & Advance experiences [12] documents. For detailed version of the same steps, please see the Adopt OpenJDK home page [13]. Basically building a JDK image from the OpenJDK code-base boils down to the below commands:

(setup steps have been made brief and some commands omitted, see links above for exact steps)

$ hg clone jdk8  (a)…OpenJDK8
$ hg clone jdk9  (a)…OpenJDK9

$ ./                                    (b)
$ bash configure                                      (c)
$ make clean images                                   (d)

(setup steps have been made brief and some commands omitted, see links above for exact steps)

To explain what is happening at each of the steps above:
(a) We clone the openjdk mercurial repo just like we would using git clone ….
(b) Once we have step (a) completed, we change into the folder created, and run the command, which is equivalent to a git fetch or a git pull, since the step (a) only brings down base files and not all of the files and folders.
(c) Here we run a script that checks for and creates the configuration needed to do the compile and build process
(d) Once step (c) is success we perform a complete compile, build and create JDK and JRE images from the built artefacts

As you can see these are dead-easy steps to follow to build an artefact or JDK/JRE images [step (a) needs to be run only once].

– contribute to the evolution and improvement of the Java the language & platform
– learn about the internals of the language and platform
– learn about the OS platform and other technologies whilst doing the above
– get involved in F/OSS projects
– stay on top the latest changes in the Java / JVM sphere
– knowledge and experience that helps professionally but also these are not readily available from other sources (i.e. books, training, work-experience, university courses, etcetera).
– advancement in career
– personal development (soft skills and networking)

Join the Adopt OpenJDK [13] and Betterrev [15] projects and contribute by giving us feedback about everything Java including these projects. Join the Adoption Discuss mailing list [14] and other OpenJDK related mailing lists to start with, these will keep you updated with latest progress and changes to OpenJDK. Fork any of the projects you see and submit changes via pull-requests.

Thanks and support
Adopt OpenJDK [13] and umbrella projects have been supported and progressed with help of JCP [21], the Openjdk team [22], JUGs like London Java Community [16], SouJava [17] and other JUGs in Brazil, a number of JUGs in Europe i.e. BGJUG (Bulgarian JUG) [18], BeJUG (Belgium JUG) [19], Macedonian JUG [20], and a number of other small JUGs. We hope in the coming time more JUGs and individuals would get involved. If you or your JUG wish to participate please get in touch.

Special thanks to +Martijn Verburg (incepted Adopt OpenJDK), +Richard Warburton+Oleg Shelajev+Mite Mitreski, +Kaushik Chaubal and +Julius G for helping improve the content and quality of this post, and sharing their OpenJDK experience with us.

How to get started ?
Join the Adoption Discuss mailing list [14], go to the Adopt OpenJDK home page [13] to get started, followed by referring to the Adopt OpenJDK Getting Started Kit [11] and Adopt OpenJDK Intermediate & Advance experiences [12] documents.

Please share your comments here or tweet at @theNeomatrix369.

[8] OpenJDK8 
[17] SouJava 

This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

CMS Pipelines … for NetRexx on the JVM

This year I want to tell you about a new and exciting addition to NetRexx (which, incidentally just turned 19 years old the day before yesterday). NetRexx, as some of you know, is the first alternative language for the JVM, stems from IBM, and is free and open source since 2011 ( It is a happy marriage of the Rexx Language (Michael Cowlishaw, IBM, 1979) and the JVM. NetRexx can run compiled ahead of time, as .class files for maximum performance, or interpreted, for a quick development cycle, or very dynamic production of code. After the addition of Scripting in version 3.03 last year, the new release (3.04, somewhere at the end of 2014) include Pipes.

We know what pipes are, I hear you say, but what are Pipes? A Pipeline, also called a Hartmann Pipeline, is a concept that extends and improves pipes as they are known from Unix and other operating systems. The name pipe indicates an inter- process communication mechanism, as well as the programming paradigm it has introduced. Compared to Unix pipes, Hartmann Pipelines offer multiple input- and output streams, more complex pipe topologies, and a lot more, too much for this short article but worthy of your study.

Pipelines were first implemented on VM/CMS, one of IBM’s mainframe operating systems. This version was later ported to TSO to run under MVS and has been part of several product configurations. Pipelines are widely used by VM users, in a symbiotic relationship with REXX, the interpreted language that also has its origins on this platform. Pipes in the NetRexx version are compile by a special Pipes Compiler that has been integrated with NetRexx. The resulting code can run on every platform that has a JVM (Java Virtual Machine), including z/VM and z/OS for that matter. This portable version of Pipelines was started by Ed Tomlinson in 1997 under the name of njpipes, when NetRexx was still very new, and was open sourced in 2011, soon after the NetRexx translator itself. It was integrated into the NetRexx translator in 2014 and will be released integrated in the NetRexx distribution for the first time with version 3.04. It answers the eternal question posed to the development team by every z/VM programmer we ever met: “But … Does It Have Pipes?” It also marks the first time that a non-charge Pipelines product runs on z/OS. But of course most of you will be running Linux, Windows or OSX, where NetRexx and Pipes also run splendidly.

NetRexx users are very cautious of code size and peformance – for example because applications also run on limited JVM specifications as JavaME, in Lego Robots and on Androids and Raspberries, and generally are proud and protective of the NetRexx runtime, which weighs in at 37K (yes, 37 kilobytes, it even shrunk a few bytes over the years). For this reason, the Pipes Compiler and the Stages are packaged in the NetRexxF.jar – F is for Full, and this jar also includes the eclipse Java compiler which makes NetRexx a standalone package that only needs a JRE for development. There is a NetRexxC.jar for those who have a working Java SDK and only want to compile NetRexx. So we have NetRexxR.jar at 37K, NetRexxC.jar at 322K, and the full NetRexx kaboodle in 2.8MB – still small compared to some other JVM Languages.

The pipeline terminology is a metaphore derived from plumbing. Fitting two or more pipe segments together yield a pipeline. Water flows in one direction through the pipeline. There is a source, which could be a well or a water tower; water is pumped through the pipe into the first segment, then through the other segments until it reaches a tap, and most of it will end up in the sink. A pipeline can be increased in length with more segments of pipe, and this illustrates the modular concept of the pipeline. When we discuss pipelines in relation to computing we have the same basic structure, but instead of water that passes through the pipeline, data is passed through a series of programs (stages) that act as filters. Data must come from some place and go to some place. Analogous to the well or the water tower there are device drivers that act as a source of the data, where the tap or the sink represents the place the data is going to, for example to some output device as your terminal window or a file on disk, or a network destination. Just as water, data in a pipeline flows in one direction, by convention from the left to the right.

A program that runs in a pipeline is called a stage. A program can run in more than one place in a pipeline – these occurrences function independent of each other. The pipeline specification is processed by the pipeline compiler, and it must be contained in a character string; on the commandline, it needs to be between quotes, while when contained in a file, it needs to be between the delimiters of a NetRexx string. An exclamation mark (!) is used as stage separator, while the solid vertical bar | can be used as an option when specifiying the local option for the pipe, after the pipe name. When looking a two adjaced segments in a pipeline, we call the left stage the producer and the stage on the right the consumer, with the stage separator as the connector.

A device driver reads from a device (for instance a file, the command prompt, a machine console or a network connection) or writes to a device; in some cases it can both read and write. An example of a device drivers are diskr for diskread and diskw for diskwrite; these read and write data from and to files. A pipeline can take data from one input device and write it to a different device. Within the pipeline, data can be modified in almost any way imaginable by the programmer. The simplest process for the pipeline is to read data from the input side and copy it unmodified to the output side. The pipeline compiler connects these programs; it uses one program for each device and connects them together. All pipeline segments run on their own thread and are scheduled by the pipeline scheduler. The inherent characteristic of the pipeline is that any program can be connected to any other program because each obtains data and sends data throug a device independent standard interface. The pipeline usually processes one record (or line) at a time. The pipeline reads a record for the input, processes it and sends it to the output. It continues until the input source is drained.

Until now everything was just theory, but now we are going to show how to compile and run a pipeline. The executable script pipe is included in the NetRexx distribution to specify a pipeline and to compile NetRexx source that contains pipelines. Pipelines can be specified on the command line or in a file, but will always be compiled to a .class file for execution in the JVM.

 pipe ”(hello) literal ”hello world” ! console”

This specifies a pipeline consisting of a source stage literal that puts a string (“hello world”) into the pipeline, and a console sink, that puts the string on the screen. The pipe compiler will echo the source of the pipe to the screen – or issue messages when something was mistyped. The name of the classfile is the name of the pipe, here specified between parentheses. Options also go there. We call execute the pipe by typing:

java hello

Now we have shown the obligatory example, we can make it more interesting by adding a reverse stage in between:

pipe ”(hello) literal ”hello world” ! reverse ! console

When this is executed, it dutifully types “dlrow olleh”. If we replace the string after literal with arg(), we then can start the hello pipeline with a an argument to reverse: and we run it with:

java hello a man a plan a canal panama

and it will respond:

amanap lanac a nalp a nam a

which goes to show that without ignoring space no palindrome is very convincing – which we can remedy with a change to the pipeline: use the change stage to take out the spaces:

pipe”(hello) literal arg() ! change /” ”// ! console”

Now for the interesting parts. Whole pipeline topologies can be added, webservers can be built, relational databases (all with a jdbc driver) can be queried. For people that are familiar with the z/VM CMS Pipelines product, most of its reference manual is relevant for this implementation. We are working on the new documentation to go with NetRexx 3.04.

Pipes for NetRexx are the work of Ed Tomlinson, Jeff Hennick, with contributions by Chuck Moore, myself, and others. Pipes were the first occasion I have laid eyes on NetRexx, and I am happy they now have found their place in the NetRexx open source distribution. To have a look at it, download the NetRexx source from the Kenai site ( ) and build a 3.04 version yourself. Alternatively, wait until the 3.04 package hits
This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

Own your heap: Iterate class instances with JVMTI

Today I want to talk about a different Java that most of us don’t see and use every day, to be more exact about lower level bindings, some native code and how to perform some small magic. Albeit we won’t get to the true source of magic on JVM, but performing some small miracles is within a reach of a single post.

I spend my days researching, writing and coding on the RebelLabs team at ZeroTurnaround, a company that creates tools for Java developers that mostly run as javaagents. It’s often the case that if you want to enhance the JVM without rewriting it or get any decent power on the JVM you have to dive into the beautiful world of Java agents. These come in two flavors: Java javaagents and native ones. In this post we’ll concentrate on the latter.

Note, this GeeCON Prague presentation by Anton Arhipov, who is an XRebel product lead, is a good starting point to learn about javaagents written entirely in Java: Having fun with Javassist.

In this post we’ll create a small native JVM agent, explore the possibility of exposing native methods into the Java application and find out how to make use of the Java Virtual Machine Tool Interface.

If you’re looking for a practical takeaway from the post, we’ll be able to, spoiler alert, count how many instances of a given class are present on the heap.

Imagine that you are Santa’s trustworthy hacker elf and the big red has the following challenge for you:
Santa: My dear Hacker Elf, could you write a program that will point out how many Thread objects are currently hidden in the JVM’s heap?
Another elf that doesn’t like to challenge himself would answer: It’s easy and straightforward, right?

return Thread.getAllStackTraces().size();

But what if we want to over-engineer our solution to be able to answer this question about any given class? Say we want to implement the following interface?

public interface HeapInsight {
int countInstances(Class klass);

Yeah, that’s impossible, right? What if you receive String.class as an argument? Have no fear, we’ll just have to go a bit deeper into the internals on the JVM. One thing that is available to JVM library authors is JVMTI, a Java Virtual Machine Tool Interface. It was added ages ago and many tools, that seem magical, make use of it. JVMTI offers two things:

  • a native API
  • an instrumentation API to monitor and transform the bytecode of classes loaded into the JVM.

For the purpose of our example, we’ll need access to the native API. What we want to use is the IterateThroughHeap function, which lets us provide a custom callback to execute for every object of a given class.

First of all, let’s make a native agent that will load and echo something to make sure that our infrastructure works.

A native agent is something written in a C/C++ and compiled into a dynamic library to be loaded before we even start thinking about Java. If you’re not proficient in C++, don’t worry, plenty of elves aren’t, and it won’t be hard. My approach to C++ includes 2 main tactics: programming by coincidence and avoiding segfaults. So since I managed to write and comment the example code for this post, collectively we can go through it. Note: the paragraph above should serve as a disclaimer, don’t put this code into any environment of value to you.

Here’s how you create your first native agent:


using namespace std;

JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM *jvm, char *options, void *reserved)
cout << "A message from my SuperAgent!" << endl;
return JNI_OK;

The important part of this declaration is that we declare a function called Agent_OnLoad, which follows the documentation for the dynamically linked agents.

Save the file as, for example a native-agent.cpp and let’s see what we can do about turning into a library.

I’m on OSX, so I use clang to compile it, to save you a bit of googling, here’s the full command:

clang -shared -undefined dynamic_lookup -o -I /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/include/ -I /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/include/darwin native-agent.cpp

This creates an file that is a library ready to serve us. To test it, let’s create a dummy hello world Java class.

package org.shelajev;
public class Main {
public static void main(String[] args) {
System.out.println("Hello World!");

When you run it with a correct -agentpath option pointing to the, you should see the following output:

java org.shelajev.Main
A message from my SuperAgent!
Hello World!

Great job! We now have everything in place to make it actually useful. First of all we need an instance of jvmtiEnv, which is available through a JavaVM *jvm when we are in the Agent_OnLoad, but is not available later. So we have to store it somewhere globally accessible. We do it by declaring a global struct to store it.


using namespace std;

typedef struct {
jvmtiEnv *jvmti;
} GlobalAgentData;

static GlobalAgentData *gdata;

JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM *jvm, char *options, void *reserved)
jvmtiEnv *jvmti = NULL;
jvmtiCapabilities capa;
jvmtiError error;

// put a jvmtiEnv instance at jvmti.
jint result = jvm->GetEnv((void **) &jvmti, JVMTI_VERSION_1_1);
if (result != JNI_OK) {
printf("ERROR: Unable to access JVMTI!n");
// add a capability to tag objects
(void)memset(&capa, 0, sizeof(jvmtiCapabilities));
capa.can_tag_objects = 1;
error = (jvmti)->AddCapabilities(&capa);

// store jvmti in a global data
gdata = (GlobalAgentData*) malloc(sizeof(GlobalAgentData));
gdata->jvmti = jvmti;
return JNI_OK;

We also updated the code to add a capability to tag objects, which we’ll need for iterating through the heap. The preparations are done now, we have the JVMTI instance initialized and available for us. Let’s offer it to our Java code via a JNI.

JNI stands for Java Native Interface, a standard way to include native code calls into a Java application. The Java part will be pretty straightforward, add the following countInstances method definition to the Main class:

package org.shelajev;

public class Main {
public static void main(String[] args) {
System.out.println("Hello World!");
int a = countInstances(Thread.class);
System.out.println("There are " + a + " instances of " + Thread.class);

private static native int countInstances(Class klass);

To accommodate the native method, we must change our native agent code. I’ll explain it in a minute, but for now add the following function definitions there:

extern "C"
JNICALL jint objectCountingCallback(jlong class_tag, jlong size, jlong* tag_ptr, jint length, void* user_data)
int* count = (int*) user_data;
*count += 1;

extern "C"
JNIEXPORT jint JNICALL Java_org_shelajev_Main_countInstances(JNIEnv *env, jclass thisClass, jclass klass)
int count = 0;
jvmtiHeapCallbacks callbacks;
(void)memset(&callbacks, 0, sizeof(callbacks));
callbacks.heap_iteration_callback = &objectCountingCallback;
jvmtiError error = gdata->jvmti->IterateThroughHeap(0, klass, &callbacks, &count);
return count;

Java_org_shelajev_Main_countInstances is more interesting here, its name follows the convention, starting with Java_ then the _ separated fully qualified class name, then the method name from the Java code. Also, don’t forget the JNIEXPORT declaration, which says that the function is exported into the Java world.

Inside the Java_org_shelajev_Main_countInstances we specify the objectCountingCallback function as a callback and call IterateThroughHeap with the parameters that came from the Java application.

Note that our native method is static, so the arguments in the C counterpart are:

JNIEnv *env, jclass thisClass, jclass klass

for an instance method they would be a bit different:

JNIEnv *env, jobj thisInstance, jclass klass

where thisInstance points to the this object of the Java method call.

Now the definition of the objectCountingCallback comes directly from the documentation. And the body does nothing more than incrementing an int.

Boom! All done! Thank you for your patience. If you’re still reading this, you’re ready to test all the code above.

Compile the native agent again and run the Main class. This is what I see:

java org.shelajev.Main
Hello World!
There are 7 instances of class java.lang.Thread

If I add a Thread t = new Thread(); line to the main method, I see 8 instances on the heap. Sounds like it actually works. Your thread count will almost certainly be different, don’t worry, it’s normal because it does count JVM bookkeeping threads, that do compilation, GC, etc.

Now, if I want to count the number of String instances on the heap, it’s just a matter of changing the argument class. A truly generic solution, Santa would be happy I hope.

Oh, if you’re interested, it finds 2423 instances of String for me. A pretty high number for such as small application. Also,

return Thread.getAllStackTraces().size();

gives me 5, not 8, because it excludes the bookkeeping threads! Talk about trivial solutions, eh?

Now you’re armed with this knowledge and this tutorial I’m not saying you’re ready to write your own JVM monitoring or enhancing tools, but it is definitely a start.

In this post we went from zero to writing a native Java agent, that compiles, loads and runs successfully. It uses the JVMTI to obtain the insight into the JVM that is not accessible otherwise. The corresponding Java code calls the native library and interprets the result.
This is often the approach the most miraculous JVM tools take and I hope that some of the magic has been demystified for you.

What do you think, does it clarify agents for you? Let me know! Find me and chat with me on twitter: @shelajev.

This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

(Part 3 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al

This is a continuation of the previous post titled (Part 2 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al.

In our first review, The Atlassian guide to GC tuning is an extensive post covering the methodology and things to keep in mind when tuning GC, practical examples are given and references to important resources are also made in the process. The next one How NOT to measure latency by Gil Tene, he discusses some common pitfalls encountered in measuring and characterizing latency, demonstrating and discussing some false assumptions and measurement techniques that lead to dramatically incorrect reporting results, and covers simple ways to sanity check and correct these situations.  Finally Kirk Pepperdine in his post Poorly chosen Java HotSpot Garbage Collection Flags and how to fix them! throws light on some JVM flags – he starts with some 700 flags and boils it down to merely 7 flags. Also cautions you to not just draw conclusions or to take action in a whim but consult and examine – i.e. measure don’t guess!


By default JVM tunings attempt to provide acceptable performance in majority of the cases, but it may not always give the desired results depending on application behaviour. Benchmark your application before applying any GC or JVM related settings. There are a number of environment, OS, hardware related factors to taken into consideration and many a times any tuning to JVM may be linked to these factors and if they chance the tuning may need to be revisited. Beware and accept the fact that any tuning has its upper limit in a given environment, if your goals are still not met, then you need to improve your environment. Always monitor your application to see if your tuning goals are still in scope.

Choosing Performance Goals – GC for the Oracle JVM can be reduced by targeting three goals ie. latency (mean & maximum), throughput, and footprint. To reach the tuning goals, there are principles that guide you to tuning the GC i.e. Minor GC reclamation, GC maximise memory, two of three (pick 2 of 3 of these goals, and sacrifice the other).

Preparing your environment for GC – 

Some of the items of interest are load the application with work (apply load to the application as it would be in production), turn on GC logging, set data sampling windows, determine memory footprint, rules of thumb for generation sizes, determine systemic requirements.

Iteratively change the JVM parameters keeping the environment its setting in tact. An example of turning on GC logging is:

java -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:<file> …
java -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<file> …

Sampling windows are important to help specifically diagnose an application’s runtime rather than the load-time or time it or the JVM takes to stabilise before it can execute anything. Set the heap sizes either by guessing (give it maximum and then pull back) or by using results accumulated in the GC logs. Also check for OOME which can be an indicator if heap size or standard memory needs increasing. There are a number of rules of thumb for generation sizes (a number of them depend on sizes of other pools). 

Understanding the Throughput collector and Behaviour based Tuning
Hotspot uses behaviour based tuning with parallel collectors including the MarkAndSweep collector, they are designed to keep three goals in mind i.e. maximum pause time, throughput and footprint. The-XX:+PrintAdaptiveSizePolicy JVM flag prints details of this behaviour for analysis. Even though there are options to set the maximum pause time for any GC event there is no guarantee it will be performed. It also uses the best of 2 (from 3) goals – i.e. only 2 goals are focussed at a time, and there is interaction between goals, and sometimes there is oscillation between them.

Time to tune
There is a cycle to follow when doing this, i.e. 
  1. Determine desired behaviour.
  2. Measure behaviour before change.
  3. Determine indicated change.
  4. Make change.
  5. Measure behaviour after change.
  6. If behaviour not met, try again.
Repeat the above till the results are met (or nearly there).

Java VisualVM with Visual GC plugin for live telemetry (note there’s an observer effect when using VisualVM, which can have an impact on your readings). GCViewer by Tagtruam industries, useful for post-hoc review of GC performance based on the GC log. 

Conclusions: a lot of factors need to be taken into consideration when tuning GC. A meticulous plan needs to be considered, keeping an eye on your goal throughout the process and once achieved, you continue to monitor your application to see if your performance goals are still intact. Adjust your requirements after studying the results of your actions such that they both meet at some point.

— The authors cover the methodology in good detail and give practical advise and steps to follow in order to tune your application – the hands-on and practical aspects of the blog should definitely be read. —

The author starts with saying  “some of the mistakes others including the author has made” – plenty of wrong ways to do things and what to avoid doing.

WHY and HOW – you measure latency, HOW only makes sense if you understand the WHY. There is public domain code / tools available that can be used to measure latency.

Don’t use statistics to measure latency!

Classic way of looking at latency – response time (latency) is a function of load! Its important to know what this response time – is it average, worst case, 99 percentile, etc… based on different types of behaviours?

Hiccups – some sort of accumulated work that gets performed in bursts (spikes). Important to look at these behaviours when looking at response time and average response time. They are not evenly spread around, but may look like periodic freezes, shifts from one mode/behaviour to another (no sequence).

Common fallacies:
– computers run application code continously
– response time can be measure as a work units over time
– response time exhibits a normal distribution
– “glitches” or “semi-random omissions” in measurement don’t have a big effect

Hiccups are hidden knowingly or unknowingly – i.e. averages and standard deviations can help!
Many compute the 99 percentile or another figure from averages and distributions. A better way to deal with it is to measure it using data – for e.g. plot a histogram. You need to have a SLA for the entire spread and plot the distribution across 0 to 100 percentile.

Latency tells us how long something should take! True real-time is being slow and steady!
Author gives examples of types of applications with different latency needs i.e. The Olympics, Pacemaker, Low latency trading, Interactive applications.

A very important way to establishing requirements – an extensive interview process, to help the client come up with a realistic figures for the SLA requirements. We need to find out what is the worst the client is willing to accept and for how long or how often is that acceptable, to help construct a percentile table.

Question is how fast can the system run, smoothly without hitting an obstacles or crashes! The author further makes a comparison of performance between the HotSpot JVM and C4.

Coordinated omission problem – what is it? data that it is not emitted in a random way but in a coordinated way that skews the actual data. Delays in request times does not get measured due to the manner in which response time is recorded from within the system – which omits random uncoordinated data. The maximum value (time) in the data usually is the key value to start with to compute and compare the other percentiles and check if it matches with that computed with coordinated emission in it. Percentiles are useful, compute them.

Graphs with sudden bumps and more smooth lines tend to have coordinated emission in it.

HdrHistogram helps plot histograms – a configurable precision system, telling it the range and precision to cover. It has built in compensation for Coordinated Omission – its OpenSource and available on Github under the CCO license.

jHiccup also plots histogram to show how your computer system has been behaving (run-time and freeze time), also records various percentiles, maximum and minimum values, distribution, good modes, bad modes, etc…. Also opensource under COO license.

Measuring throughput without a latency requirement, the information is meaningless.  Mistakes in measurement/analysis can lead to orders-of-magnitude errors and lead to bad business decisions.

— The author has shown some cool graphs and illustrations of how we measure things wrong and what to look out for when assessing data or even graphs. Gives a lot of tips, on how NOT to measure latency. —

Poorly chosen Java HotSpot Garbage Collection Flags and how to fix them! by Kirk Pepperdine

There are more than 700 product flags defined in the HotSpot JVM and not many have a good idea of what effect it might have when used in runtime, let be the combination of two or more together.

Whenever you think you know a flag, it often pays if you check out the source code underlying that flag to get a more accurate idea.

Identifying redundant flags
The author enlists a huge list of flags for us to determine if we need them or not. It see a list of flags available with your version of java, try this command, to get a long list of options:

$ java -XX:+PrintFlagsFinal  

You can narrow down the list by eliminating deprecated flags and ones with a default value, leaving us with a much shorter list.

DisableExplicitGC, ExplicitGCInvokesConcurrent, and RMI Properties
Calling System.gc() isn’t a great idea and one way to disable that functionality from code that uses it, is to use the DisableExplicitGC flag as one of the arguments of your VM options flags. Or instead use the ExplicitGCInvokesConcurrent flag to invoke the Concurrent Mark and Sweep cycle instead of the Full GC cycle (invoked by System.gc()). You can also set the interval period between complete GC cycles using the sun.rmi.dgc.client.gcInterval and sun.rmi.dgc.server.gcInterval properties.

Xmx, Xms, NewRatio, SurvivorRatio and TargetSurvivorRatio
The -Xmx flag is used to set the maximum heap size, and all JVMs are meant to support this flag. By adjusting the minimum and maximum heap sizes such that the JVM can’t resize the heap size at runtime disables the properties of the adaptive sizing properties.

Use GC logs to help make memory pool size decisions. There is a ratio to maintain between Young Gen and Old Gen and within Young Gen between Eden and the Survivor spaces. The SurvivorRatio flag is one such flag that helps decide how much space will be used by the survivor spaces in Young gen and the rest will be left for Eden. These ratios are key towards determining the frequency and efficiency of the collections in Young Gen. TargetSurvivorRatio is another ratio to examine and cross-check before using.

PermSize, MaxPermSize

These options are removed starting JDK 8 in favour of Meta space, but in previous versions it helped in setting PermGen’s resizing abilities. Restricting its adaptive nature might lead to longer pauses of old generational spaces. 

UseConcMarkSweepGC, CMSParallelRemarkEnabled, UseCMSInitiatingOccupancyOnly CMSInitiatingOccupancyFraction

UseConcMarkSweepGC makes the JVM select the CMS collector, CMS works with ParNew   instead of the PSYoung and PSOld collectors for the respective generational spaces. 

CMSInitiatingOccupancyFraction (IOF) is flag that helps the JVM determine how to maintain the occupancy of data (by examining the current rate of promotion) in the Old Gen space, sometimes leading to STW, FullGCs or CMF (concurrent mode failure). 

UseCMSInitiatingOccupancyOnly tells the JVM to use the IOF without taking the rate of promotion into account.

CMSParallelRemarkEnabled helps parallelise the fourth phase of the CMS (single threaded by default), but use this option only after benchmarking your current production systems.


CMSClassUnloadingEnabled ensures that classes loaded in the PermGen space is regularly unloaded reducing situations of out-of-PermGen-space. Although this could lead to longer concurrent and remark (pause) collection times.


AggressiveOpts as the name suggests helps improve performance by switching on and off certain flags but this strategy hasn’t change across various builds of Java and one must examine benchmarks of their system before using this flag in production.

Next Steps
Of the number of flags examined the below are more interesting flags to look at:
Again examine the GC logs to see if any of the above should be used or not, if used, what values are appropriate to set. Quite often the JVM does get it right, right out of the box, hence getting it wrong can be detrimental to your application performance. Its easy to mess up then to get it right, when you have so many flags with dependent and overlapping functionalities.

Conclusion:  Refer to benchmarks and GC logs for making decisions on enabling and disabling of flags and the values to set when enabled. Lots of flags, easy to get them wrong, only use them if they are absolutely needed, remember the Hotspot JVM has built in machinery to adapt (adaptive policy), consult an expert if you still want to.

— The author has done justice by giving us a gist of what the large number of JVM flags and recommend reading the article to learn about specifics of certain flags.. —

As it is not practical to review all such videos and articles, a number of them have been provided in the links below for further study. In many cases I have paraphrased or directly quoted what the authors have to say to preserve the message and meaning they wished to convey. 

A few other authors have written articles related to these subjects, for the Java Advent Calendar, see below,

Using Intel Performance Counters To Tune Garbage Collection
How NOT to measure latency

Feel free to post your comments below or tweet at @theNeomatrix369!

    Useful resources

      This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

      (Part 2 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al

      This is a continuation of the previous post titled (Part 1 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al.

      Without any further ado, lets get started with our next set of blogs and videos, chop…chop…! This time its Martin Thompson’s blog posts and talks. Martin’s first post on Java Garbage collection distilled basically distils the GC process and the underlying components including throwing light on a number of interesting GC flags (-XX:…). In his next talk he does his myth busting shaabang about mechanical sympathy, what people correctly believe in and also some of the misconceptions they have bought into. In the talk on performance testing, Martin takes it further and fuses Java, OS and the hardware to show how understanding aspects of all these can help write better programs. 

      Java Garbage Collection Distilled by Martin Thompson

      There are too many flags to allow tuning the GC to achieve the throughput and latency your application requires. There’s plenty of documentation on the specifics of the bells and whistles around them but none to guide you through them.

      – The Tradeoffs – throughput (-XX:GCTimeRatio=99), latency (-XX:MaxGCPauseMillis=<n>) and memory (-Xmx<n>) are the key variables that the collectors depend upon. It is important to note that Hotspot often cannot achieve the above targets. If a low-latency application goes unresponsive for more than a few seconds it can spill disaster. Tradeoffs play out as they
      * provide more memory to GC algorithms
      * GC can be reduced by containing the live set and keeping heap size small
      * frequency of pauses can be reduced by managing heap and generation sizes & controlling application’s object allocation rate
      * frequency of large pauses can be reduced by running GC concurrently

      – Object Lifetimes
      GC algorithms are often optimised with the expectation that most objects live for a very short period of time, while relatively few live for very long. Experimentation has shown that generational garbage collectors support much better throughput than non-generational ones – hence used in server JVMs.

      – Stop-The-World Events
      For GC to occur it is necessary that all the threads of a running application must pause – garbage collectors do this by signalling the threads to stop when they come to a safe-point. Time to safe-point is an important consideration in low-latency applications and can be found using the ‑XX:+PrintGCApplicationStoppedTime flag in addition to the other GC flags. When a STW event occurs a system will undergo significant scheduling pressure as the threads resume when released from safe-points, hence less STWs makes an application more efficient.

      – Heap Organisation in Hotspot
      Java heap is divided in various regions, an object is created in Eden, and moved into the survivor spaces, and eventually into tenured. PermGen was used to store runtime objects such as classes and static strings. Collectors take help of Virtual spaces to meet throughput & latency targets, and adjusting the region sizes to reach the targets.

      – Object Allocation
      TLAB (Thread Local Allocation Buffer) is used to allocate objects in Java, which is cheaper than using malloc (takes 10 instructions on most platforms). Rate of minor collection is directly proportional to the rate of object allocation. Large objects (-XX:PretenureSizeThreshold=n) may have to be allocated in Old Gen, but if the threshold is set below the TLAB size, then they will not be created in old gen – (note) does not apply to the G1 collector.

      – Minor Collections
      Minor collection occurs when Eden becomes full, objects are promoted to the tenured space from Eden once they get old i.e. cross the threshold (-XX:MaxTenuringThreshold).  In minor collection, live reachable objects with known GC roots are copied to the survivor space. Hotspot maintains cross-generational references using a card table. Hence size of the old generation is also a factor in the cost of minor collections. Collection efficiency can be achieved by adjusting the size of Eden to the number of objects to be promoted. These are prone to STW and hence problematic in recent times.

      – Major Collections
      Major collections collect the old generation so that objects from the young gen can be promoted. Collectors track a fill threshold for the old generation and begin collection when the threshold is passed. To avoid promotion failure you will need to tune the padding that the old generation allows to accommodate promotions (-XX:PromotedPadding=<n>). Heap resizing can be avoided using the –Xms and -Xmx flags. Compaction of old gen causes one of the largest STW pauses an application can experience and directly proportion to the number of live objects in old gen. Tenure space can be filled up slower, by adjusting the survivor space sizes and tenuring threshold but this in turn can cause longer minor collection pause times as a result of increased copying costs between the survivor spaces.

      – Serial Collector
      It is the simplest collector with the smallest footprint (-XX:+UseSerialGC) and uses a single thread for both minor and major collections.

      – Parallel Collector
      Comes in two forms (-XX:+UseParallelGC) and (-XX:+UseParallelOldGC) and uses multiple threads for minor collections and a single thread for major collections – since Java 7u4 uses multiple threads for both type of collections. Parallel Old performs very well on a multi-processor system, suitable for batch applications. This collector can be helped by providing more memory, larger but fewer collection pauses. Weigh your bets between the Parallel Old and Concurrent collector depending on how much pause your application can withstand (expect 1 to 5 seconds pauses per GB of live data on modern hardware while old gen is compacted).

      – Concurrent Mark Sweep (CMS) Collector
      CMS (-XX:+UseConcMarkSweepGC) collector runs in the Old generation collecting tenured objects that are no longer reachable during a major collection. CMS is not a compacting collector causing fragmentation in Old gen over time. Promotion failure will trigger FullGC when a large object cannot fit in Old gen. CMS runs alongside your application taking CPU time. CMS can suffer “concurrent mode failures” when it fails to collect at a sufficient rate to keep up with promotion.

      – Garbage First (G1) Collector
      G1 (-XX:+UseG1GC) is a new collector introduced in Java 6 and now officially supported in Java 7. It is a generational collector with a partially concurrent collecting algorithm, compacts the Old gen with smaller incremental STW pauses. It divides the heap into fixed sized regions of variable purpose. G1 is target driven on latency (–XX:MaxGCPauseMillis=<n>, default value = 200ms). Collection on the humongous regions can be very costly. It uses “Remembered Sets” to keep track of references to objects from other regions. There is a lot of cost involved with book keeping and maintaining “Remembered Sets”. Similar to CMS, G1 can suffer from an evacuation failure (to-space overflow).

      – Alternative Concurrent Collectors
      Oracle JRockit Real Time, IBM Websphere Real Time, and Azul Zing are alternative concurrent collectors.  Zing according to the author is the only Java collector that strikes a balance between collection, compaction, and maintains a high-throughput rate for all generations. Zing is concurrent for all phases including during minor collections, irrespective of heap size. For all the concurrent collectors targeting latency you have to give up throughput and gain footprint. Budget for heap size at least 2 to 3 times the live set for efficient operation.

      – Garbage Collection Monitoring & Tuning
      Important flags to always have enabled to collect optimum GC details:


      Use applications like GCViewer, JVisualVM (with Visual GC plugin) to study the behaviour of your application as a result of GC actions. Run representative load tests that can be executed repeatedly (as you gain knowledge of the various collectors), keep experimenting with different configuration until you reach your throughput and latency targets. jHiccup helps track pauses within the JVM.  As known to us it’s a difficult challenge to strike a balance between latency requirements, high object allocation and promotion rates combined, and that sometimes choosing a commercial solution to achieve this might be a more sensible idea.

      Conclusion: GC is a big subject by itself and has a number of components, some of them are constantly replaced and it’s important to know what each one stands for. The GC flags are as much important as the component to which they are related and it’s important to know them and how to use them. Enabling some standard GC flags to record GC logs does not have any significant impact on the performance of the JVM. Using third-party freeware or commercial tools help as long as you follow the authors methodology.

      — Highly reading the article multiple times, as Martin has covered lots of details about GC and the collectors which requires close inspection and good understanding.  —

      Martin Thompson’s:  Mythbusting modern hardware to gain “Mechanical Sympathy” Video * Slides

      He classifies myth into three categories – possible, plausible and busted! In order to get the best out of the hardware you own, you need TO KNOW your hardware. Make tradeoffs as you go along, changing knobs ;), it’s not as scary as you may think.

      Good question: do normal developers understand the hardware they program on? Or cannot understand what’s going on? Or do we have the discipline and make the effort to understand the platform we work with?

      Myth 1CPUs are not getting faster – clock speed isn’t everything, Sandy bridge architecture is the faster breed. 6 ports to support parallelism (6 ops/cycle). Haswell has 8 ports! Code doing division operations perform slower than any other arithmetic operations. CPUs have both front-end and back-end cycles. It’s getting faster as we are feeding them faster! – PLAUSIBLE

      Myth 2 – Memory provides us random access – CPU registers and buffers, internal caches (L1, L2, L3) and memory – mentioned in the order in of speed of access to these areas respectively. Manufacturers have been doing things to CPUs to bring down its operational temperature by performing direct access operations. Writes are less hassle than reads – buffer misses are costly. L1 is organised into cache-lines containing code that the processor will execute – efficiency is achieved by not having cache-line misses. Pre-fetchers help reduce latency and help during reading streaming and predictable data. TLB misses can also cost efficiency (size of TLB = 4K = size of memory page). In short reading memory isn’t anywhere close to reading it randomly but SEQUENTIALLY due to the way the underlying hardware works. Writing highly branched code can cause slow down in execution of the program – keep things together that is, cohesive is the key to efficiency. – BUSTED
      Note: TLAB and TLB are two different concepts, google to find out the difference!

      Myth 3 – HDD provides random access – spinning disc and an arm moves about reading data. More sectors are placed in the outer tracks than the inner tracks (zone bit recording). Spinning the discs faster isn’t the way to increase HDD performance. 4K is the minimum you can read or write at a time. Seek time in the best disc is 3-6 ms, laptop drives are slower (15 ms). Data transfers take 100-220 Mbytes/sec. Adding a cache can improve writing data into the disc, not so much for reading data from disks. – BUSTED

      Myth 4 – SSDs provides random access – Reads and writes are great, works very fast (4K at a time). Delete is not like the real deletes, it’s marked deleted and not really deleted – as you can’t erase at  high resolution hence a whole block needs to be erased at a time (hence marked deleted). All this can cause fragmentation, and GC and compaction is required. Reads are smooth, writes are hindered by fragmentation, GC, compaction, etc…, also to be ware of write-amplification. A few disadvantages when using SSD but overall quite performant. – PLAUSIBLE

      Can we understand all of this and write better code?

      Conclusion: do not take everything in the space for granted just because it’s on the tin, examine, inspect and investigate for yourself the internals where possible before considering it to be possible, or plausible – in order to write good code and take advantage of these features.

      — Great talk and good coverage of the mechanical sympathy topic with some good humour, watch the video for performance statistics gathered on each of the above hardware components  —

      “Performance Testing Java Applications” by Martin Thompson

      “How to use a profiler?” or “How to use a debugger?” 
      What is Performance? Could mean two things like throughput or bandwidth (how much can you get through) and latency (how quickly the system responds).

      Response time changes as we apply more load on the system. Before we design any system, we need to gather performance requirements i.e. what is the throughput of the system or how fast you want the system to respond (latency)? Does your system scale economically with your business?

      Developer time is expensive, hardware is cheap! For anything, you need a transaction budget (with a good break-down of the different components and processes the system is going to go through or goes through).

      How can we do performance testing? Apply load to a system and see if the throughput goes up or down? And what is the response time of the system as we apply load. Stress testing is different from load testing (see Load testing) , stress testing (see Stress testing) is a point where things break (collapse of a system), an ideal system will continue with a flat line. Also it’s important to perform load testing not just from one point but from multiple points and concurrently. Most importantly high duration testing is very important – which bring a lot of anomalies to the surface i.e. memory leaks, etc…
      Building up a test suite is a good idea, a suite made up of smaller parts. We need to know the basic building blocks of the system we use and what we can get out of it. Do we know the different threshold points of our systems and how much its components can handle? Very important to know the algorithms we use, know how to measure them and use it accordingly.
      When should we test performance? 
      “Premature optimisation is the root of all evil” – Donald Knuth / Tony Hoare

      What does optimisation mean? Knowing and choosing your data and working around it for performance. New development practices: we need to test early and often!

      From a Performance perspective, “test first” practice is very important, and then design the system gradually as changes can cost a lot in the future.

      RedGreenDebugProfileRefactor, a new way of “test first” performance methodology as opposed to RedGreenRefactor methodology only! Earlier and shorter feedback cycle is better than finding something out in the future.

      Use “like live” pairing stations, Mac is a bad example to work on if you are working in the Performance space – a Linux machine is a better option. 

      Performance tests can fail the build – and it should fail a build in your CI system! What should a micro benchmark look like (i.e. calliper)? Division operations in your code can be very expensive, instead use a mask operator!

      What about concurrency testing? Is it just about performance? Invariants? Contention? 

      What about system performance tests? Should we be able to test big and small clients with different ranges. It’s fun to know deep-dive details of the system you work on. A business problem is the core and the most important one to solve and NOT to discuss on what frameworks to use to build it. Please do not use Java serialisation as it is not designed for on the-wire-protocol! Measure performance of a system using a observer rather than measure it from inside the system only.
      Performance Testing Lessons – lots of technical stuff and lots of cultural stuff. Technical Lessons – learn how to measure, check out histograms! Do not sample a system, we miss out when things when the system go strange, outliers, etc… – histograms help! Knowing what is going on around the areas where the system takes a long time is important! Capturing time from the OS is a very important as well. 
      With time you get – accuracy, precision and resolution, and most people mix all of these up. On machines with dual sockets, time might not be synchronised. Quality of the time information is very dependent on the OS you are using. Time might be an issue on virtualised systems, or between two machines. This issue can be resolved, do round-trip times between two systems (note the start and stop clock times) and half them to get a more accurate time. Time can go backwards on you on certain OSes (possibly due to NTP) – instead use monotonic time.
      Know your system, its underlying components – get the metrics and study them ! Use a Linux tool like perstat, will give lots of performance and statistics related information on your CPU and OS – branch predictions and cache-misses!

      RDTSC is not an ordered-instructions execution system, x86 is an ordered instruction systems and operations do not occur in a unordered fashion.

      Theory of constraints! – Always start with number 1 problem on the list, the one that takes most of the time – the bottleneck of your system, the remaining sequence of issues might be dependent on the number 1 issue and are not separate issues!

      Trying to create a performance team is an anti-pattern – make the experts help bring the skills out to the rest of the team, and stretch-up their abilities!

      Beware of YAGNI – about doing performance tests – smell of excuse!
      Commit builds > 3.40 mins = worrying, same for acceptance test build > 15 mins = lowers team confidence.
      Test environment should equal to production environment! Easy to get exactly similar hardware these days!
      Conclusion: Start with a “test first” performance testing approach when writing applications that are low latency dependent. Know your targets and work towards it. Know your underlying systems all the way from hardware to the development environment. Its not just technical things that matter, cultural things matter as much, when it comes to performance testing. Share and spread the knowledge across the team rather than isolating it to one or two people i.e. so called experts in the team. Its everyone’s responsibility not just a few seniors in the team. Learn more about time across various hardware and operating systems, and between systems.

      As it is not practical to review all such videos and articles, a number of them have been provided in the links below for further study. In many cases I have paraphrased or directly quoted what the authors have to say to preserve the message and meaning they wished to convey. A follow-on to this blog post will appear in the same space under the title (Part 3 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al on 23rd Dec 2013.

      Feel free to post your comments below or tweet at @theNeomatrix369!

      Useful resources

      This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

      Anatomy of a Java Decompiler

      Mike Strobel & Lee Benfield

      A decompiler, simply put, attempts to reverse the transformation of source code to object code. But there are many interesting complexities—Java source code is structured; bytecode certainly isn’t. Moreover, the transformation isn’t one-to-one: two different Java programs may yield identical bytecode. We need to apply heuristics in order to get a reasonable approximation of the original source.

      (A Tiny) Bytecode Refresher

      In order to understand how a decompiler works, it’s necessary to understand the basics of byte code. If you’re already familiar with byte code, feel free to skip ahead to the next section.

      The JVM is a stack-based machine (as opposed to a register-based machine), meaning instructions operate on an evaluation stack. Operands may be popped off the stack, various operations performed, and the results pushed back onto the stack for further evaluation. Consider the following method:

      public static int plus(int a, int b) {
      int c = a + b;
      return c;

      Note: All byte code shown in this article is output from javap, e.g., javap -c -p MyClass.

      public static int plus(int, int);
      stack=2, locals=3, arguments=2
      0: iload_0 // load ‘x’ from slot 0, push onto stack
      1: iload_1 // load ‘y’ from slot 1, push onto stack
      2: iadd // pop 2 integers, add them together, and push the result
      3: istore_2 // pop the result, store as ‘sum’ in slot 2
      4: iload_2 // load ‘sum’ from slot 2, push onto stack
      5: ireturn // return the integer at the top of the stack

      (Comments added for clarity.)

      A method’s local variables (including arguments to the method) are stored in what the JVM refers to as the local variable array. We’ll refer to a value (or reference) stored in location #x in the local variable array as `slot #x’ for brevity (see JVM Specification §3.6.1).

      For instance methods, the value in slot #0 is always the this pointer. Then come the method arguments, from left to right, followed by any local variables declared within the method. In the example above, the method is static, so there is no this pointer; slot #0 holds parameter x, and slot #1 holds parameter y. The local variable sum resides in slot #2.

      It’s interesting to note that each method has a max stack size and max local variable storage, both of which are determined at compile time.

      One thing that’s immediately obvious from here, which you might not initially expect, is that the compiler made no attempt to optimise the code. In fact, javac almost never emits optimized bytecode. This has multiple benefits, including the ability to set breakpoints at most locations: if we were to eliminate the redundant load/store operations, we’d lose that capability. Thus, most of the heavy lifting is performed at runtime by a just-in-time (JIT) compiler.


      So, how can you take unstructured, stack-based byte code and translate it back into structured Java code? One of the first steps is usually to get rid of the operand stack, which we do by mapping stack values to variables and inserting the appropriate load and store operations.

      A ‘stack variable’ is only assigned once, and consumed once. You might note that this will lead to a lot of redundant variables—more on this later! The decompiler may also reduce the byte code into an even simpler instruction set, but we won’t consider that here.

      We’ll use the notation s0 (etc.) to represent stack variables, and v0 to represent true local variables referenced in the original byte code (and stored in slots).

      Bytecode Stack Variables Copy Propagation
      s0 = v0
      s1 = v1
      s2 = s0 + s1
      v2 = s2
      s3 = v2
      return s3

      v2 = v0 + v1

      return v2

      We get from bytecode to variables by assigning identifiers to every value pushed or popped, e.g., iadd pops two operands to add and pushes the result.

      We then apply a technique called copy propagation to eliminate some of the redundant variables. Copy propagation is a form of inlining in which references to variables are simply replaced with the assigned value, provided the transformation is valid.

      What do we mean by “valid”? Well, there are some important restrictions. Consider the following:

      0: s0 = v1
      1: v1 = s4
      2: v2 = s0 <-- s0 cannot be replaced with v1

      Here, if we were to replace s0 with v0, the behavior would change, as the value of v0 changes after s0 is assigned, but before it is consumed. To avoid these complications, we only use copy propagation to inline variables that are assigned exactly once.

      One way to enforce this might be to track all stores to non-stack variables, i.e., we know that v1 is assigned v10 at #0, and also v11 at #2. Since there is more than one assignment to v1, we cannot perform copy propagation.

      Our original example, however, has no such complications, and we end up with a nice, concise result:

      v2 = v0 + v1
      return v2

      Aside: Restoring Variable Names

      If variables are reduced to slot references in the byte code, how then do we restore the original variable names? It’s possible we can’t. To improve the debugging experience, the byte code for each method may include a special section called the local variable table. For each variable in the original source, there exists an entry that specifys the name, the slot number, and the bytecode range for which the name applies. This table (and other useful metadata) can be included in the javap disassembly by including the -v option. For our plus() method above, the table looks like this:

      Start  Length  Slot  Name   Signature
      0 6 0 a I
      0 6 1 b I
      4 2 2 c I

      Here, we see that v2 refers to an integer variable originally named ‘c‘ at bytecode offsets #4-5.

      For classes that have been compiled without local variable tables (or which had them stripped out by an obfuscator), we have to generate our own names. There are many strategies for doing this; a clever implementation might look at how a variable is used for hints on an appropriate name.

      Stack Analysis

      In the previous example, we could guarantee which value was on top of the stack at any given point, and so we could name s0, s1, and so on.

      So far, dealing with variables has been pretty straightforward because we’ve only explored methods with a single code path. In a real world application, most methods aren’t going to be so accommodating. Each time you add a loop or condition to a method, you increase the number of possible paths a caller might take. Consider this modified version of our earlier example:

      public static int plus(boolean t, int a, int b) {
      int c = t ? a : b;
      return c;

      Now we have control flow to complicate things; if try to perform the same assignments as before, we run into a problem.

      Bytecode Stack Variables
      ifeq 8
      goto 9
      s0 = v0
      if (s0 == 0) goto #8
      s1 = v1
      goto #9
      s2 = v2
      v3 = {s1,s2}
      s4 = v3
      return s4

      We need to be a little smarter about how we assign our stack identifiers. It’s no longer sufficient to consider each instruction on its own; we need to keep track of how the stack looks at any given location, because there are multiple paths we might take to get there.

      When we examine #9, we see that istore_3 pops a single value, but that value has two sources: it might have originated at either #5 or #8. The value at the top of the stack at #9 might be either s1 or s2, depending on whether we came from #5 or #8, respectively. Therefore, we need to consider these to be the same variable—we merge them, and all references to s1 or s2 become references to the unambiguous variable s{1,2}. After this ‘relabeling’, we can safely perform copy propagation.

      After Relabeling After Copy Propagation
      s0 = v0
      if (s0 == 0) goto #8
      s{1,2} = v1
      goto #9
      s{1,2} = :v2
      v3 = s{1,2}
      s4 = v3
      return s4
      if (v0 == 0) goto #8
      s{1,2} = v1
      goto #9
      s{1,2} = v2
      v3 = s{1,2}

      return v3

      Notice the conditional branch at #1: if the value of s0 is zero, we jump to the else block; otherwise, we continue along the current path. It’s interesting to note that the test condition is negated when compared to the original source.

      We’ve now covered enough to dive into…

      Conditional Expressions

      At this point, we can determine that our code could be modeled with a ternary operator (?:): we have a conditional, with each branch having a single assignment to the same stack variable s{1,2}, after which the two paths converge.

      Once we’ve identified this pattern, we can roll the ternary up straight away.

      After Copy Prop. Collapse Ternary
      if (v0 == 0) goto #8
      s{1,2} = v1
      goto 9
      s{1,2} = v2
      v3 = s{1,2}

      return v3

      v3 = v0 != 0 ? v1 : v2

      return v3

      Note that, as part of the transformation, we negated the condition at #9. It turns out that javac is fairly predictable in how it negates conditions, so we can more closely matche the original source if we flip those conditions back.

      Aside – But what are the types?

      When dealing with stack values, the JVM uses a more simplistic type system than Java source. Specifically, boolean, char, and short values are manipulated using the same instructions as int values. Thus, the comparison v0 != 0 could be interpreted as either:

      v0 != false ? v1 : v2
      v0 != 0 ? v1 : v2
          …or even:
      v0 != false ? v1 == true : v2 == true
          …and so on!

      In this case, however, we are fortunate to know the exact type of v0, as it is contained within the method descriptor:

          descriptor: (ZII)I

      This tells us the method signature is of the form:

          public static int plus(boolean, int, int)

      We can also infer that v3 should be an int (as opposed to a boolean) because it is used as the return value, and the descriptor tells us the return type. We are then left with:

      v3 =  v0 ? v1 : v2
      return v3

      As an aside, if v0 were a local variable (and not a formal parameter) then we might not know it represents a boolean value and not an int. Remember that local variable table we mentioned earlier, which tells us the original variable names? It also includes information about the variable types, so if your compiler is configured to emit debug information, we can look to that table for type hints. There is another, similar table called LocalVariableTypeTable that contains similar information; the key difference is that a LocalVariableTypeTable may include detailed information about generic types, whereas a LocalVariableTable cannot. It’s worth noting that these tables are unverified metadata, so they are not necessarily trustworthy. A particularly devious obfuscator might choose to fill these tables with lies, and the resulting bytecode would still be valid! Use them at your own discretion.

      Short-Circuit Operators ('&&' and '||')

      public static boolean fn(boolean a, boolean b, boolean c){
      return a || b && c;

      How could anything be simpler? The bytecode is, unfortunately, a bit of a pain…

      Bytecode Stack Variables After Copy Propagation
      ifne #12
      ifeq #16
      ifeq #16
      goto #17
      s0 = v0
      if (s0 != 0) goto #12
      s1 = v1
      if (s1 == 0) goto #16
      s2 = v2
      if (s2 == 0) goto #16
      s3 = 1
      goto 17
      s4 = 0
      return s{3,4}
      if (v0 != 0) goto #12

      if (v1 == 0) goto #16

      if (v2 == 0) goto #16
      s{3,4} = 1
      goto 17
      s{3,4} = 0
      return s{3,4}

      The ireturn instruction at #17 could return either s3 or s4, depending on what path is taken. We alias them as described above, and then we perform copy propagation to eliminate s0, s1, and s2.

      We are left with three consecutive conditionals at #1, #5 and #7. As we mentioned earlier, conditional branches either jump or fall through to the next instruction.

      The bytecode above contains sequences of conditional branches that follow specific and very useful patterns:

      Conditional Conjunction (&&) Conditional Disjunction (||)

      if (c1) goto L1
      if (c2) goto L2


      if (!c1 && c2) goto L2

      if (c1) goto L2
      if (c2) goto L2


      if (c1 || c2) goto L2

      If we consider neighbouring pairs of conditionals above, #1...#5 do not conform to either of these patterns, but #5...#9 is a conditional disjunction (||), so we apply the appropriate transformation:

      1: if (v0 != 0) goto #12
      5: if (v1 == 0 || v2 == 0) goto #16
      12: s{3,4} = 1
      13: goto #17
      16: s{3,4} = 0
      17: return s{3,4}

      Note that every transform we apply might create opportunities to perform additional transforms. In this case, applying the || transform restructured our conditions, and now #1...#5 conform to the && pattern! Thus, we can further simplify the method by combining these lines into a single conditional branch:

      1: if (v0 == 0 && (v1 == 0 || v2 == 0)) goto #16
      12: s{3,4} = 1
      13: goto #17
      16: s{3,4} = 0
      17: return s{3,4}

      Does this look familiar? It should: this bytecode now conforms to the ternary (?:) operator pattern we covered earlier. We can reduce #1...#16 to a single expression, then use copy propagation to inline s{3,4} into the return statement at #17:

      return (v0 == 0 && (v1 == 0 || v2 == 0)) ? 0 : 1;

      Using the method descriptor and local variable type tables described earlier, we can infer all the types necessary to reduce this expression to:

      return (v0 == false && (v1 == false || v2 == false)) ? false : true;

      Well, this is certainly more concise than our original decompilation, but it’s still pretty jarring. Let’s see what we can do about that. We can start by collapsing comparisons like x == true and x == false to x and !x, respectively. We can also eliminate the ternary operator by reducing x ? false : true as the simple expression !x.

      return !(!v0 && (!v1 || !v2));

      Better, but it’s still a of a handful. If you remember your high school discrete maths, you can see that De Morgan’s theorem can be applied here:

          !(a || b) --> (!a) && (!b)
      !(a && b) --> (!a) || (!b)

      And thus:

      return ! ( !v0 && ( !v1 || !v2 ) )


      return ( v0 || !(!v1 || !v2 ) )

      …and eventually:

      return ( v0 || (v1 && v2) )


      Dealing with Method Calls

      We’ve already seen what it looks like for a method to be called: arguments ‘arrive’ in the locals array. To call a method, arguments must be pushed onto the stack, following a this pointer in the case of for instance methods). Calling a method in bytecode is as you’d expect:

          push arg_0
      push arg_1
      invokevirtual METHODREF

      We’ve specified invokevirtual above, which is the instruction used to invoke most instance methods. The JVM actually has a handful of instructions used for method calls, each with different semantics:

      1. invokeinterface invokes interface methods.

      2. invokevirtual invokes instance methods using virtual semantics, i.e., the call is dispatched to the proper override based on the runtime type of the target.

      3. invokespecial invokes invokes a specific instance method (without virtual sematics); it is most commonly used for invoking constructors, but is also used for calls like super.method().

      4. invokestatic invokes static methods.

      5. invokedynamic is the least common (in Java), and it uses a ‘bootstrap’ method to invoke a custom call site binder. It was created to improve support for dynamic languages, and has been used in Java 8 to implement lambdas.

      The important detail for a decompiler writer is that the class’s constant pool contains details for any method called, including the number and type of its arguments, as well as its return type. Recording this information in the caller class allows the runtime to verify that the intended method exists at runtime, and that it conforms to the expected signature. If the target method is in third party code, and its signature changes, then any code that attempts to call the old version will throw an error (as opposed to producing undefined behavior).

      Going back to our example above, the presence of the invokevirtual opcode tells us that the target method is an instance method, and therefore requires a this pointer as an implicit first argument. The METHODREF in the constant pool tells us that the method has one formal parameter, so we know we need to pop an argument off of the stack in addition to the target instance pointer. We can then rewrite the code as:


      Of course, bytecode isn’t always so friendly; there’s no requirement that stack arguments be pushed neatly onto the stack, one after the other. If, for example, one of the arguments was a ternary expression, then there would be intermediate load, store, and branch instructions that will need to be transformed independently. Obfuscators might rewrite methods into a particularly convoluted sequence of instructions. A good decompiler will need to be flexible enough to handle many interesting edge cases that are beyond the scope of this article.

      There has to be more to it than this…

      So far, we’ve limited ourselves to analyzing a single sequence of code, beginning with a list of simple instructions and applying transformations that produce more familiar, higher-level constructs. If you are thinking that this seems a little too simplistic, then you are correct. Java is a highly structured language with concepts like scopes and blocks, as well as more complicated control flow mechanisms. In order to deal with constructs like if/else blocks and loops, we need to perform a more rigorous analysis of the code, paying special attention to the various paths that might be taken. This is called control flow analysis.

      We begin by breaking down our code into contiguous blocks that are guaranteed to execute from beginning to end. These are called basic blocks, and we construct them by splitting up our instruction list along places where we might jump to another block, as well as any instructions which might be jump targets themselves.

      We then build up a control flow graph (CFG) by creating edges between the blocks to represent all possible branches. Note that these edges may not be explicit branches; blocks containing instructions that might throw exceptions will be connected to their respective exception handlers. We won’t go into detail about constructing CFGs, but some high level knowledge is required to understand how we use these graphs to detect code constructs like loops.

      [Control Flow Graph Examples]
      Examples of control flow graphs.

      The aspects of control flow graphs that we are most interested in are domination relationships:

      • A node D is said to dominate another node N if all paths to N pass through D. All nodes dominate themselves; if D and N are different nodes, then D is said to strictly dominate N.

      • If D strictly dominates N and does not strictly dominate any other node that strictly dominates N, then D can be said to immediately dominate N.

      • The dominator tree is a tree of nodes in which each node’s children are the nodes it immediately dominates.

      • The dominance frontier of D is the set of all nodes N such that D dominates an immediate predecessor of N but does not strictly dominate N. In other words, it is the set of nodes where D‘s dominance ends.

      Basic Loops and Control Flow

      Consider the following Java method:

      public static void fn(int n) {
      for (int i = 0; i < n; ++i) {

      …and its disassembly:

      0: iconst_0
      1: istore_1
      2: iload_1
      3: iload_0
      4: if_icmpge 20
      7: getstatic #2 // System.out:PrintStream
      10: iload_1
      11: invokevirtual #3 // PrintStream.println:(I)V
      14: iinc 1, 1
      17: goto 2
      20: return

      Let’s apply what we discussed above to convert this into a more readable form, first by introducing stack variables, then performing copy propagation.

      Bytecode Stack Variables After Copy Propagation
      if_icmpge 20
      getstatic #2
      invokevirtual #3
      iinc 1, 1
      goto 2
      s0 = 0
      v1 = s0
      s2 = v1
      s3 = v0
      if (s2 >= s3) goto 20
      s4 = System.out
      s5 = v1
      v1 = v1 + 1
      goto 2
      v1 = 0

      if (v1 >= v0) goto 20

      v1 = v1 + 1
      goto 4

      Notice how the conditional branch at #4 and the goto at #17 create a logical loop. We can see this more easily by looking at a control flow graph:

      From the graph, it’s obvious that we have a neat cycle, with an edge from the goto back to a conditional branch. In this case, the conditional branch is referred to as a loop header, which can be defined as a dominator with a loop-forming backward edge. A loop header dominates all nodes within the loop’s body.

      We can determine whether a condition is a loop header by looking for a loop-forming back edge, but how do we do that? A simple solution is to test whether the condition node is in its own dominance frontier. Once we know we have a loop header, we have to figure out which nodes to pull into the loop body; we can do that by finding all nodes dominated by the header. In pseudocode, our algorithm would look something like this:

      q := new Queue()
      r := new Set()


      while (not q.empty())
      n := q.dequeue()

      if (header.dominates(n))

      for (s in n.successors())

      return r

      Once we’ve figured out the loop body, we can transform our code into a loop. Keep in mind that our loop header may be a conditional jump out of the loop, in which case we need to negate the condition.

      v1 = 0
      while (v1 < v0) {
      v1 = v1 + 1

      And voila, we have a simple pre-condition loop! Most loops, including while, for and for-each, compile down to the same basic pattern, which we detect as a simple while loop. There’s no way to know for sure what kind of loop the programmer originally wrote, but for and for-each follow pretty specific patterns that we can look for. We won’t go into the details, but if you look at the while loop above, you can see how the original for loop’s initializer (v1 = 0) precedes the loop, and its iterator (v1 = v1 + 1) has been inserted at the end of the loop body. We’ll leave it as an exercise to think about when and how one might transform while loops into for or for-each. It’s also interesting to think about how we might have to adjust our logic to detect post-conditional (do/while) loops.

      We can apply a similar technique to decompile if/else statements. The bytecode pattern is pretty straightforward:

      iftrue(!condition) goto #else
      // `if` block begins here
      goto #end

      // `else` block begins here

      // end of `if/else`

      Here, we use iftrue as a pseudo-instruction to represent any conditional branch: test a condition, and if it passes, then branch; otherwise, continue. We know that the if block starts at the instruction following the condition, and the else block starts at the condition’s jump target. Finding the contents of those blocks is as simple as finding the nodes dominated by those starting nodes, which we can do using the same algorithm we described above.

      We’ve now covered basic control flow mechanisms, and while there are others (e.g., exception handlers and subroutines), those are beyond the scope of this introductory article.


      Writing a decompiler is no simple task, and the experience easily translates into a book’s worth of material—perhaps even a series of books! Obviously, we could not cover everything in a single blog post, and you probably wouldn’t want to read it if we’d tried. We hope that by touching on the most common constructs—logical operators, conditionals, and basic control flow—we have given you an interesting glimpse into the world of decompiler development.

      Lee Benfield is the author of the CFR Java decompiler.
      Mike Strobel is the author of Procyon, a Java decompiler and metaprogramming framework.

      Now go write your own! 🙂

      This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

      Mastering Java Bytecode

      Hey! Happy Advent 😀 I’m Simon Maple (@sjmaple), the Technical Evangelist for ZeroTurnaround. You know, the JRebel guys! We’ll as a result of writing a product like JRebel which interacts with bytecode, more often than you care to imagine, there are many things we’ve learned about it which we’d love to share.

      Let’s start at the start… Java was a language designed to run on a virtual machine so that it only needed to be compiled once to run everywhere (yes, yes, write once, test everywhere). As a result, the JVM which you install onto your system would be native, allowing the code that runs on it to be platform agnostic. Java bytecode is the intermediate representation of the Java code you write as source and is the result of you compiling your code. So your class files are the bytecode.

      To be more succinct, Java bytecode is the code set used by the Java Virtual Machine that is JIT-compiled into native code at runtime.

      Have you ever played about with assembler or machine code? Bytecode is kind of similar, in a way, but many people in the industry don’t really play with it that much, more out of the lack necessity. However it is important to understand what’s going on, and useful if you want to out-geek someone in the pub.

      Firstly, let’s take a look at some bytecode basics. We’ll take the expression ‘1+2’ first and see how this gets executed as Java bytecode. 1+2 can be written in reverse Polish notation as 1 2 +. Why? Well when we put it on a stack it all becomes clear…

      OK, in bytecode we’d actually see opcodes (iconst_1 and iconst_2) and an instruction (iadd) rather than push and add, but the flow is the same. The actual instructions are one byte in length, hence bytecode. There are 256 possible opcodes as a result, but only 200 or so are used. Opcodes are prefixed with a type followed by the operation name. So what we saw previously with iconst and iadd, are constants of integer type and an add instruction for integer types.

      This is all very well and good, but how about reading class files. Typically, all you normally see in a class file when opened, in your editor of choice, is a bunch of smiley faces and some squares, dots and other weird characters, right? The answer is in javap, a code utility you actually get with your JDK. Let’s look at a code example to see javap in action.

      public class Main {

          public static void main(String[] args){

              MovingAverage app = new MovingAverage();



      Once this class is compiled into a Main.class file, we can use the following command to extract the bytecode: javap -c Main

      Compiled from ""

      public class algo.Main {
        public algo.Main();
             0: aload_0
             1: invokespecial #1
             4: return
      // Method java/lang/Object."<init>":()V
      public static void main(java.lang.String[]);
             0: new           #2
             3: dup
             4: invokespecial #3
             7: astore_1
            8: return

      We can see we have our default constructor and main method in the byte code straight away. By the way, this is how Java gives you a default constructor for constructor-less classes! The bytecode in the constructor is simply a call to super(), while our main method creates a new instance of the MovingAverage  and returns. The #n characters actually refer to constants which we can view using the -verbose argument as follows: javap -c -verbose Main. The interesting part of what is returned is shown below:

      public class algo.Main

        SourceFile: ""

        minor version: 0

        major version: 51

        flags: ACC_PUBLIC, ACC_SUPER

      Constant pool:

         #1 = Methodref    #5.#21         //  java/lang/Object."<init>":()V

         #2 = Class        #22            //  algo/MovingAverage

         #3 = Methodref    #2.#21         //  algo/MovingAverage."<init>":()V

         #4 = Class        #23            //  algo/Main

         #5 = Class        #24            //  java/lang/Object

      Now we can match our instructions to our constants and we can piece together what’s actually happening much easier. Is anything still bugging you about the example above? No? What about the numbers in front of each instruction…

             0: new           #2

             3: dup

             4: invokespecial #3

             7: astore_1

             8: return

      Now it’s really bugging you, right? 🙂 Here’s what we’ve got if we visualise this method body as an array:

      Note that every instruction has a HEX representation, so using that we actually see this:

       We can actually see this in the class file if we open it in a HEX editor:

      We could actually change the bytecode here in our HEX editor, but let’s be honest, it’s not something you’d really want to do, particularly on a Friday afternoon after the obligatory pub trip. Better ways to do this would be using ASM or javassist.

      Let’s move on from our basic example and add some local variables that store state and interact directly with our stack. Check out the following code:

      public static void main(String[] args) {

        MovingAverage ma = new MovingAverage();

        int num1 = 1;

        int num2 = 2;



        double avg = ma.getAvg();


      Lets see what we get this time in our bytecode:

0: new  #2    // class algo/MovingAverage

      3: dup

      4: invokespecial #3  // Method algo/MovingAverage."<init>":()V

      7: astore_1

      8: iconst_1

      9: istore_2

      10: iconst_2

      11: istore_3

      12: aload_1

      13: iload_2

      14: i2d

      15: invokevirtual #4        // Method algo/MovingAverage.submit:(D)V

      18: aload_1

      19: iload_3

      20: i2d

      21: invokevirtual #4        // Method algo/MovingAverage.submit:(D)V

      24: aload_1

      25: invokevirtual #5        // Method algo/MovingAverage.getAvg:()D

      28: dstore     4


      Start  Length  Slot  Name   Signature

      0       31         0    args   [Ljava/lang/String;

      8       23        1      ma     Lalgo/MovingAverage;

      10      21         2     num1   I

      12       19         3      num2   I

      30       1        4    avg     D

      This looks a lot more interesting… We can see that we create an object of type MovingAverage which is stored in local variable, ma, via the astore_1 instruction (1 is the slot number in the LocalVariableTable). Instructions iconst_1 and iconst_2 are there to load constants 1 and 2 to the stack and store them in LocalVariableTable slots 2 and 3 respectively by instructions istore_2 and istore_3. A load instruction pushed a local variable onto the stack, which a store instruction pops the next item from the stack and stores it in the LocalVariableTable. It’s important to realise that when a store instruction is used, the item is taken off of the stack and if you want to use it again, you’ll need to load it.

      How about the flow of execution? All we’ve seen is a simple progression from one line to the next. I want to see some BASIC style GOTO 10 in the mix! Let’s take another example:

      MovingAverage ma = new MovingAverage();

      for (int number : numbers) {



      In this case the flow of execution will jump around many times as we traverse the for loop. This bytecode, assuming that the numbers variable is a static field in the same class is shown as the following:

      0: new #2 // class algo/MovingAverage

      3: dup

      4: invokespecial #3 // Method algo/MovingAverage."<init>":()V

      7: astore_1

      8: getstatic #4 // Field numbers:[I

      11: astore_2

      12: aload_2

      13: arraylength

      14: istore_3

      15: iconst_0

      16: istore 4

      18: iload 4

      20: iload_3

      21: if_icmpge 43

      24: aload_2

      25: iload 4

      27: iaload

      28: istore 5

      30: aload_1

      31: iload 5

      33: i2d

      34: invokevirtual #5 // Method algo/MovingAverage.submit:(D)V

      37: iinc 4, 1

      40: goto 18

      43: return


      Start  Length  Slot  Name   Signature

      30       7         5    number I

      12       31        2    arr$     [I

      15       28        3    len     $I

      18       25         4     i$      I

      0       49         0     args  [Ljava/lang/String;

      8       41         1    ma     Lalgo/MovingAverage;

      48      1         2    avg    D

      The instructions from position 8 through 17 are used to setup the loop. There are three variables in the LocalVariable table that aren’t really mentioned in the source, arr$, len$ and i$. These are the loop variables. arr$ stores the reference value of the numbers field from which the length of the loop, len$ is derived. i$ is the loop counter which is incremented by the iinc instruction.

      First we need to test our loop expression, which is performed by a comparison instruction:

      18: iload 4

      20: iload_3

      21: if_icmpge 43

      We’re loading 4 and 4 onto the stack, which are the loop counter and the loop length. We’re checking id i$ is greater than or equal to len$. If it is, we jump to statement 43, otherwise we proceed. We can then perform our logic in the loop and at the end, we increment our counter and jump back to our code that checks the loop condition on statement 18.

      37: iinc       4, 1       // increment i$

      40: goto       18         // jump back to the beginning of the loop

      There are a bunch of arithmetical opcodes and type command combinations that can be used in bytecode, including the following:

      As well as a number of type conversion opcodes which are important when assigning say an integer to a variable of type long.

      In our precious example we pass an integer to a submit method which takes a double. Java syntax does this for us, but in bytecode, you’ll see the i2d opcode is used:

      31: iload 5

33: i2d

      34: invokevirtual #5 // Method algo/MovingAverage.submit:(D)V

      So, you’ve made it this far. Well done, you’ve earned a coffee! Is any of this actually useful to know or is it just geek fodder? Well, it’s both! Firstly now, you can tell your friends that you’re a JVM that can process bytecode, and secondly you can better understand what you’re doing when writing bytecode. For example, when using ObjectWeb ASM, which is one of the most widely used bytecode manipulation tools, you’ll find yourself constructing instructions and this knowledge will prove invaluable!

      If you found this interesting and want to know more, then checkout our free Mastering Java Bytecode report from Anton Arhipov, the JRebel Product Lead at ZeroTurnaround. (JRebel uses javassist and we have had lots of fun learning and interactive with Java bytecode!) This report goes into more depth and touches on how to use ASM.

      Thanks for reading! Let me know what you thought! (@sjmaple)

      Meta: this post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!