Adopt OpenJDK & Java community: how can you help Java !

Introduction

I want to take the opportunity to show what we have been doing in last year and also what we have done so far as members of the community. Unlike other years I have decided to keep this post less technical compare to the past years and compared to the other posts on Java Advent this year.

InTheBeginning

This year marks the fourth year since the first OpenJDK hackday was held in London (supported by LJC and its members) and also when the Adopt OpenJDK program was started. Four years is a small number on the face of 20 years of Java, same goes to the size of the Adopt OpenJDK community which forms a small part of the Java community (9+ million users). Although the post is non-technical in nature, the message herein is fairly important for the future growth and progress of our community and the next generation developers.

Creations of the community

Creations from the community

Over the many months a number of members of our community contributed and passed on their good work to us. In no specific order I have enlisted these picking them from memory. I know there are more to name and you can help us by sharing those with us (we will enlist them here).  So here are some of those that we can talk about and be proud of, and thank those who were involved:

  • Getting Started page – created to enabled two way communication with the members of the community, these include a mailing list, an IRC channel, a weekly newsletter, a twitter handle, among other social media channels and collaboration tools.
  • Adopt OpenJDK project: jitwatch – a great tool created by Chris Newland, its one of its kind, ever growing with features and helping developers fine-tune the performance of your Java/JVM applications running on the JVM.
  • Adopt OpenJDK: GSK – a community effort gathering knowledge and experience from hackday attendees and OpenJDK developers on how to go about with OpenJDK from building it to creating your own version of the JDK. Many JUG members have been involved in the process, and this is now a e-book available in many languages (5 languages + 2 to 3 more languages in progress).
  • Adopt OpenJDK vagrant scripts – a collection of vagrant scripts initially created by John Patrick from the LJC, later improved by the community members by adding more scripts and refactoring existing ones. Theses scripts help build OpenJDK projects in a virtualised container i.e. VirtualBox, making building, and testing OpenJDK and also running and testing Java/JVM applications much easier, reliable and in an isolated environment.
  • Adopt OpenJDK docker scripts – a collection of docker scripts created with the help of the community, this is now also receiving contributions from a number of members like Richard Kolb (SA JUG). Just like the vagrant scripts mentioned above, the docker scripts have similar goals, and need your DevOps foo!
  • Adopt OpenJDK project: mjprof – mjprof is a Monadic jstack analysis tool set. It is a fancy way to say it analyzes jstack output using a series of simple composable building blocks (monads). Many thanks to Haim Yadid for donating it to the community.
  • Adopt OpenJDK project: jcountdown – built by the community that mimics the spirit of ie6countdown.net. That is, to encourage users to move to the latest and greatest Java! Many thanks to all those involved, you can already see from the commit history.
  • Adopt OpenJDK CloudBees Build Farm – thanks to the folks at CloudBees for helping us host our build farm on their CI/CD servers. This one was initially started by Martijn Verburg and later with the help of a number of JUG members have come to the point that major Java projects are built against different versions of the JDK. These projects include building the JDKs themselves (versions 1.7, 1.8, 1.9, Jigsaw and Shenandoah). This project has also helped support the Testing Java Early project and Quality  Outreach program.

These are just a handful of such creations and contributions from the members of the community, some of these projects would certainly need help from you. As a community one more thing we could do well is celebrate our victories and successes, and especially credit those that have been involved whether as individuals or a community. So that our next generation contributors feel inspired and encourage to do more good work and share it with us.

Contributions from the community

We want to contribute

In a recent tweet and posts to various Java / JVM and developer mailing lists, I requested the community to come forward and share their contribution stories or those from others with our community. The purpose was two-fold, one to share it with the community and the other to write this post (which in turn is shared with the community). I was happy to see a handful of messages sent to me and the mailing lists by a number of community members. I’ll share some of these with you (in the order I have received them).

Sebastian Daschner:

I don’t know if that counts as contribution but I’ve hacked on the
OpenJDK compiler for fun several times. For example I added a new
thought up ‘maybe’ keyword which produces randomly executed code:
https://blog.sebastian-daschner.com/entries/maybe_keyword_in_java

Thomas Modeneis:

Thanks for writing, I like your initiative, its really good to show how people are doing and what they have been focusing on. Great idea.
From my part, I can tell about the DevoxxMA last month, I did a talk on the Hacker Space about the Adopt the OpenJDK and it was really great. We had about 30 or more attendees, it was in a open space so everyone that was going to any talk was passing and being grabbed to have a look about the topic, it was really challenging because I had no mic. but I managed to speak out loud and be listen, and I got great feedback after the session. I’m going to work over the weekend to upload the presentation and the recorded video and I will be posting here as soon as I have it done! 🙂

Martijn Verburg:

Good initiative.  So the major items I participated in were Date and Time and Lambdas Hackdays (reporting several bugs), submitted some warnings cleanups for OpenJDK.  Gave ~10 pages of feedback for jshell and generally tried to encourage people more capable than me to contribute :-).

Andrii Rodionov:

Olena Syrota and Oleg Tsal-Tsalko from Ukraine JUG: Contributing to JSR 367 test code-base (https://github.com/olegts/jsonb-spec), promoting ‘Adopt a JSR’ and JSON-B spec at JUG UA meetings (http://jug.ua/2015/04/json-binding/) and also at JavaDay Lviv conference (http://www.slideshare.net/olegtsaltsalko9/jsonb-spec).

Contributors

Contributors gathering together

As you have seen that from out of a community of 9+ million users, only a handful of them came forward to share their stories. While I can point you out to another list of contributors who have been paramount with their contributions to the Adopt OpenJDK GitBook, for example, take a look at the list of contributors and also the committers on the git-repo. They have not just contributed to the book but to Java and the OpenJDK community, especially those who have helped translate the book into multiple languages. And then there are a number of them who haven’t come forward to add their names to the list, even though they have made valuable contributions.
Super heroes together

From this I can say contributors can be like unsung heroes, either due their shy or low-profile nature or they just don’t get noticed by us. So it would only be fair to encourage them to come forward or share with the community about their contributions, however simple or small those may be. In addition to the above list I would like to also add a number of them (again apologies if I have missed out your name or not mentioned about you or all your contributions). These names are in no particular order but as they come to my mind as their contributions have been invaluable:

  • Dalibor Topic (OpenJDK Project Lead) & the OpenJDK team
  • Mario Torre & the RedHat OpenJDK team
  • Tori Wieldt (Java Community manager) and her team
  • Heather Vancura & the JCP team
  • NightHacking, vJUG and RebelLabs (and the great people behind them)
  • Nicolaas & the team at Cloudbees
  • Chris Newland (JitWatch developer)
  • Lucy Carey, Ellie & Mark Hazell (Devoxx UK & Voxxed)
  • Richard Kolb (JUG South Africa)
  • Daniel Bryant, Richard Warburton, Ben Evans, and a number of others from LJC
  • Members of SouJava (Otavio, Thomas, Bruno, and others)
  • Members of Bulgarian JUG (Ivan, Martin, Mitri) and neighbours
  • Oti, Ludovic & Patrick Reinhart
  • and a number of other contributors who for some reason I can’t remember…

I have named them for their contributions to the community by helping organise Hackdays during the week and weekends, workshops and hands-on sessions at conferences, giving lightening talks, speaking at conferences, allowing us to host our CI and build farm servers, travelling to different parts of the world holding the Java community flag, writing books, giving Java and advance-level training, giving feedback on new technologies and features, and innumerable other activities that support and push forward the Java / JVM platform.

How you can make a difference ? And why ?

Make a difference

You can make a difference by doing something as simple as clicking the like button (on Twitter, LinkedIn, Facebook, etc…) or responding to a message on a mailing list by expressing your opinion about something you see or read about –as to why you think about it that way or how it could be different.

The answer to the question “And why ?” is simple, because you are part of a community and ‘you care’ and want to share your knowledge and experience with others — just like the others above who have spared free moments of their valuable time for us.

Is it hard to do it ? Where to start ? What needs most attention ?

important-checklist The answer is its not hard to do it, if so many have done it, you can do it as well. Where to start and what can you do ? I have written a page on this topic. And its worth reading it before going any further.

There is a dynamic list of topics that is worth considering when thinking of contributing to OpenJDK and Java. But recently I have filtered this list down to a few topics (in order of precedence):

We need you!

With that I would like to close by saying:

i_need_you_duke3

Not just “I”, but we as a community need you.

This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

Applying ForkJoin – from optimal to fast

JDK 7 is well into the hands of developers by now and most people have heard of ForkJoin, yet not so many have the time or chance in daily work to try it.

It caused, and probably still causes a bit of confusion on how is it any different than a normal thread pool.[1]

My goal in this article is to present a more elaborate, yet still simple example of ForkJoin usage through a code example.

I time and measure the performance of a Serial vs a Thread pool vs a ForkJoin aproach.

Here is the github upfront : https://github.com/fbunau/javaadvent-forkjoin

Practical problem

Imagine we have some sort of component in our system that keeps the last price of a stock for every millisecond of time.

This could be held in memory as an array of integers. (if we count in bps)

The clients of this component make queries like : what is the moment of time between time1 and time2 when the price was the lowest?

This could either be an automated algorithm or just someone in a GUI making rectangle selections.

7 queries in the example image

Let us also imagine that we get many such queries from a client batched up in a Task.

They may be batched up for reducing network traffic and round trip time.
We have different sizes of tasks that the component might get, up to 10 queries (someone with a GUI), up to 100, .. up to 1 000 0000 (some automated algorithm). We have many such clients for the component each producing tasks of different sizes. See Task.TaskType

Core problem and solution

The problem at it’s core we have to solve is the RMQ problem. Here is Wikipedia on it [2]:

“Given an array of objects taken from a well-ordered set (such as numbers), a Range Minimum Query (or RMQ) from i to j asks for the position of a minimum element in the sub-array A[i, j].”

“For example, A = [0, 5, 2, 5, 4, 3, 1, 6, 3] when , then the answer to the range minimum query for the A[3, 8] = [2, 5, 4, 3, 1, 6] is 7, as A[7] = 1 .”

There exists an efficient datastructure for solving this problem called “Segment Tree”.

I won’t go into detail on this as it is excellently covered in this classic Topcoder article [3]. This in itself is not important for this ForkJoin example, I have chosen this because it’s more interesting than a simple sum and it’s essence is kind of in the spirit of fork-join. It divides the task to be computed and then it join the results!

The data structure has O(n) initialization time and O(log N) query time, where N is the number of elements in our price per time unit value array.
So a task T contains M such queries to be made.

In an academic Computer Science approach you would just say that we’ll process each task with this efficient data structure and the complexity will be :

You can’t get more efficient than that !? Yes, in a theoretical von Neumann machine, but you can in practice.

An easy confusion to make is that because O(n/4) == O(n), then when writing a program constant factors don’t count, but they do!
Stop and think, is it the same to wait 10 or 40 minutes / hours / years ?

Going parallel

So thinking on the problem to be solved, how can we make it faster? Since every computing device now has more cores for computations, let’s put them to good use and do more things at once.
We can easily do that using the Fork Join framework.

I was first tempted to tweek the RMQ data structure and execute it’s operations in parallel. I attacked something that was already log N. But it was a big failure, it’s too much overhead for the scheduler to micromanage such short running logic.

The answer was in the end attack the M_i constant factor.

Thread pool

Before presenting how a ForkJoin solution might be applied, let’s imagine how we might apply a thread pool. See : TaskProcessorPool.java

We can have a pool of 4 workers, when we have a Task to do, we add it to the queue. As soon as a worker is available, it will retrieve from the head of the queue a pending task, and execute it.

While this is fine for tasks having the same size, and the size is relatively medium and predictable, it runs into problems when the tasks to be executed are of different sizes. One worker might be choked up with a long running task, and the others sit doing nothing.

In this image the threadpool will do only 9 out of 16 possible units of work in the 4 units of time (56% efficiency), if no more tasks will be added to the queue

Fork Join

Fork join is useful when you are in a problem domain where you can split the task to be solved into smaller ones.

What is special about a fork-join pool is that it is a work-stealing thread pool.

Each worker thread maintains a local dequeue of tasks. When taking a new task for execution it can either :

  • split the task into smaller ones
  • execute the task if it’s small enough

When a thread has no local threads in it’s dequeue, it ‘steals’ , pops tasks from the back of the queue of another random thread, and puts it in his own. There is a high chance that this task is not yet split. So he’ll have quite some work on his hands.

Comparing to the thread pool, instead of the other threads waiting for some new work, they could split the existing task into smaller ones and help the other thread with that large task.

Here is the original paper by Doug Lea for a more detailed explanation : http://gee.cs.oswego.edu/dl/papers/fj.pdf

Coming back to our example a large batch of operations could be split into multiple batches of smaller number of operations. See : TaskProcessorFJ.java

Most problems have linear series of operations like this one, it doesn’t have to be a special parallel problem for which we need to apply a specialized parallel algorithm to leverage the cores we have on the processor.

How much do you split? You split a task until you reach a threshold where generally splitting makes no sense anymore. Ex : ( splitting + a thread getting a job + context switching is more than actually executing the task as it is )

For a big XXL, task we have to do 1000000 query operations. We could split this into 2 500000 operation tasks, and do that in parallel. Is 500000 still large? Yes, we can split it more. I have chosen a group of 10000 operations to be the threshold under which there is no use in splitting and we can just execute them on the current thread.

Fork join does not split all the tasks upfront, but rather as it works through it.

Performance results

I ran 4 iterations for each implementation of processor on my i5-2500 CPU @ 3.30GHz that has 4 cores / 4 threads, after a clean reboot
Here are the results :

Doing 4 runs for each of the 3 processors. Pls wait ...
TaskProcessorSimple: 7963
TaskProcessorSimple: 7757
TaskProcessorSimple: 7748
TaskProcessorSimple: 7744
TaskProcessorPool: 3933
TaskProcessorPool: 2906
TaskProcessorPool: 4477
TaskProcessorPool: 4160
TaskProcessorFJ: 2498
TaskProcessorFJ: 2498
TaskProcessorFJ: 2524
TaskProcessorFJ: 2511
Test completed.

Conclusions

Even if you have chosen the right optimal data structure, it’s not fast until you use all the resources you have. i.e. exploiting all cores

ForkJoin is definetly an improvement over the thread pool in certain problem domains and it’s worth exploring where it can be applied, and we’ll get to see more and more parallel code.
This is the kind of processor you can buy today 12 cores / 24 threads. Now we just have to write the software to exploit the cool hardware that we have and will get in the future.

The code is here : https://github.com/fbunau/javaadvent-forkjoin if you want to play with it

Thanks for your time, drop some comments if you see any errors or have things to add.

Meta: this post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!