Sunday Monday Tuesday Wednesday Thursday Friday Saturday
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24
25 26 27 28
29 30 31 1 2 3 4

Saturday, November 15, 2014

It's that time of the year again...

Hello everyone!

We're gearing up for an other great Java advent season. Feel free to browse the archives for 2013 and 2012. Also, make sure to subscribe by one of the following ways to make sure that you don't miss a single post:

Are you considering to contribute? Check out our Java Advent Coordination Google group!

Finally, to help pass the time until December 1st, check out VirtualJUG:

It's just like your local Java User Group, but you don't have to dress up to participate :-)

Friday, April 4, 2014

Devoxx UK


Devoxx UK returns to London in June and has an awesome lineup of speakers, sessions and activities to choose from.

The initial program was recently announced and has a real international feel, with speakers attending from all over the world. They’ll be joined by attendees coming from far and wide for 2 days of really cool content. A mix of established Java Rockstars and Champions are joined by up-and-coming talent to oversee a program of 80+ sessions, labs & keynote presentations.

The speaker lineup and sessions are online now.

If you want to attend you can get a 10% discount on tickets, meaning a 2 day pass costs just £315 (+VAT). Use the promo code JADV14A when booking tickets here.

Sunday, March 30, 2014

Java 8 GA (General Availability)

We interrupt our scheduled lack of posts :p, with this important announcement:

Java 8 has been released!

Some of the highlights of the release in no particular order:

  • Lambda expressions and new API to make use of it (the streams API)
  • New date-time API inspired by Joda Time
  • Poject Nashorn (fast JS interpreter)
  • Interfaces with default methods
  • Compact profiles
  • Removing the permanent generation
  • Java Mission Control and Java Flight Recorder

Read / find out more:

Friday, December 27, 2013

Java Advent 2013 wrap-up

Sunday Monday Tuesday Wednesday Thursday Friday Saturday
1 (Part 1 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al 2 Using Matchers in Tests 3 Quick prototype: Using git in NetRexx 4 What Open Source is (and isn't) and why you should use it? 5 Using Intel Performance Counters To Tune Garbage Collection 6 Run, JUnit! Run!!! 7 A conversational guide for JDK8's lambdas - a glossary of terms
8 Mastering Java Bytecode 9 Applying ForkJoin - from optimal to fast 10 Anatomy of a Java Decompiler 11 Performance tuning - measure don't guess 12 Project Avatar - The next Java EE Feature 13 013 Things that Will go Horribly Wrong on This Friday, Dec 13, 2013 14 Springs @Primary annotation in action
15 WebSocket and Java 16 ArrayList vs. LinkedList 17 Introduction to JUnit Theories 18 Type-safing the Observer with mutually recursive bounds 19 (Part 2 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al 20 Java, the Steam controller and me 21 Big data the 'reactive' way
22 Reactive cassandra... 23 Synopsis of articles and videos part 3 of 3 24 2013 Java/JVM Year in Review and a Look to the Future! AND
How (NOT TO) measure latency
25 26 27 28
29 30 31 1 2 3 4
It was my distinct honor and pleasure to have worked with quite a wonderful group of people on the last 24 days (including this one :D). They (in order of the posting date -> Mani, I will mention you only once, I hope it's OK :P):
made Java Advent Calendar version 2013 possible. Thank you all...and see you next year!

Tuesday, December 24, 2013

How (NOT TO) measure latency

Latency is defined as

time interval between the stimulation and response
and is a value which is of importance in many computer systems (financial systems, games, websites, etc). Hence we - as computer engineers - want to specify some upper bounds / worst case scenarios for the systems we build. How can we do this? The days of counting cycles for assembly instructions are long gone (unless you work on embedded systems) - there are just too many additional factors to consider (the operating system - mainly the task scheduler, other running processes, the JIT, the GC, etc). The remaining alternative is doing empirical (hands on) testing.

Use percentiles

So we whip out JMeter, configure a load test, take the mean (average) value 土 3 x standard deviation and proudly declare that 99.73% of the users will experience latency which is in this interval. We are especially proud because (a) we considered a realistic set of calls (URLs if we are testing a website) and (b) we allowed for JIT warmup.

But we are still very wrong! (which can be sad if our company writes SLAs based on our numbers - we can bankrupt the company single-handedly!)

Lets see where the problem is and how we can fix it before we cause damage. Consider the dataset depicted below (you can get the actual values here to do your own calculations).

For simplicity there are exactly 100 values used in this example. Lets say that they represent the latency of fetching a particular URL. You can immediately tell that the values can be grouped in three distinct categories: very small (perhaps the data was already in the cache?), medium (this is what most users will see) and poor (probably there are some corner-cases). This is typical for medium-to-large complexity (ie. "real life") composed of many moving parts and is called a multimodal distributions. More on this shortly.

If we quickly drop these values into LibreOffice Calc and do the number crunching, we'll come to the conclusion that the average (mean) of the values is 40 and according to the six sigma rule 99.73% of the users should experience latencies less than 137. If you look at the chart carefully you'll see that the average (marked with red) is slightly left of the middle. You can also do a simple calculation (because there are exactly 100 values represented) and see that the maximum value in the 99th percentile is 148 not 137. Now this might not seem like a big difference, but it can be the difference between profit and bankrupcy (if you've written a SLA based on this value for example).

Where did we go wrong? Let's look again carefully at the three sigma rule (emphasis added):

nearly all values lie within three standard deviations of the mean in a normal distribution.
Our problem is that we don't have a normal distribution. We probably have a multimodal distribution (as mentioned earlier), but to be safe we should use ways of interpreting the results which are independent of the nature of the distribution.

From this example we can derive a couple of recommendations:

  1. Make sure that your test framework / load generator / benchmark isn't the bottleneck - run it against a "null endpoint" (one which doesn't do anything) and ensure that you can get an order of magnitude better numbers
  2. Take into account things like JITing (warmup periods) and GC if you're testing a JVM based system (or other systems which are based on the same principles - .NET, luajit, etc).
  3. Use percentiles. Saying things like "the median (50th percentile) response time of our system is...", "the 99.99th percentile latency is...", "the maximum (100th percentile) latency is..." is ok
  4. Don't calculate the average (mean). Don't use standard deviation. In fact if you see that value in a test report you can assume that the people who put together the report (a) don't know what they're talking about or (b) are intentionally trying to mislead you (I would bet on the first, but that's just my optimism speaking).

Look out for coordinated omission

Coordinate omission (a phrase coined by Gil Tene of Azul fame) is a problem which can occur if the test loop looks something like:

start:
  t = time()
  do_request()
  record_time(time() - t)
  wait_until_next_second()
  jump start

That is, we're trying to do one request every second (perhaps every 100ms would be more realistic, but the point stands). Many test systems (including JMeter and YCSB) have inner loops like this.

We run the test and (learning from the previous discussion) report: the 85% of the request will be served under 0.5 seconds if there are 1 requests per second. And we still can be wrong! Let us look at the diagram below to see why:

On the first line we have our test run (horizontal axis being time). Lets say that between second 3 and 6 the system (and hence all requests to it) are blocked (maybe we have a long GC pause). If you calculate the 85th percentile, you'll 0.5 (hence the claim in the previous paragraph). However, you can see 10 independent clients below, each doing the request in a different second (so we have our criteria of one request per second fulfilled). But if we crunch the numbers, we'll see that the actual 85th percentile in this case is 1.5 (three times worse than the original calculation).

Where did we go wrong? The problem is that the test loop and the system under test worked together ("coordinated" - hence the name) to hide (omit) the additional requests which happen during the time the server is blocked. This leads to underestimating the delays (as shown in the example).

  1. Make sure every request less than the sampling interval or use a better benchmarking tool (I don't know of any which correct for this) or post-process the data with Gil's HdrHistogram library which contains built-in facilities to account for coordinated omission

This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

2013 - The Java/JVM Year in Review and a Look to the Future!

Hi all,

It's been an amazing year for Java/JVM developers!

For me its been a year of fine tuning the Adopt OpenJDK and Adopt a JSR programmes with the LJC and working hard at jClarity - The Java/JVM performance analysis company that I've now been demoted to CEO of ;-).

In terms of the overall ecosystem there's been some great steps forward:
  1. The language and platform continues to evolve (e.g. Java 8 is looking in great shape!).
  2. The standards body continues to become more inclusive and open (e.g. More Java User Groups have joined the Executive Committee of the JCP).
  3. The community continues to thrive (record numbers of conferences, Java User Groups and GitHub commits).

As is traditional, I'll try and take you through some of the trends of the past year that was and a small peek into the future as well.

2013 Trends

There were some major trends in the Java/JVM and software development arena in 2013. Mobile Apps have become the norm, virtualised infrastructure is everywhere and of course Java is still growing faster (in absolute terms) than any other language out there except for Javascript. Here are some other trends worth noting.

Functional Programming

Mainstream Java developers have adopted Scala, Groovy, Clojure and other JVM languages to write safer, more concise code for call back handlers, filters, reduces and a host of other operations on collections and event handlers. A massive wave of developers will of course embrace Java 8 and Lambdas when it comes out in early 2014.

Java moving beyond traditional web apps

In 2013 Java firmly entrenched itself into areas of software development outside of the traditional 3-tier web application. It is used realm of NoSQL (e.g. Hadoop), High performance Messaging systems (e.g. LMAX Disruptor), Polyglot Actor-Like Application Frameworks (e.g. Vert.x) and much much more.

HTML 5

2013 has started to see the slow decline of Java/JVM serverside templating systems as the rise of Javascript libraries and micro frameworks (e.g. AngularJS, Twitter Bootstrap) means that developers can finally have a client side tempalting framework that doesn't suck and has real data binding to JSON/XML messages going to and from the browser. Websockets and asynchronous messaging are also nicely taken care of by the likes of SockJS.

2014+ - The Future

Java 8 - Lambdas

More than enough has been written about these, so I'll offer up a practical tutorial and the pragmatic Java 8 Lambdas Book for 2014 that you should be buying!

Java 9/10 - VM improvements

Java 9 and 10 are unlikely to bring many changes to the language itself, apart from applying Java 7 and 8 features to the internal APIs, making the overall Java API a much more pleasant experience to use.

So what is coming? Well the first likely major feature will be some sort of packed object or value type representation. This is where you will be able to defined a Java 'object' as being purely a data type. What do I mean by this? Well it means that the 'object' will be able to be laid out in memory very efficiently and not have the overhead of hashCode, equals or other regalia that comes with being a Java Object. Think of it a little bit like a C/C++ struct. Here's an example of what it might look like in source code:

@Packed
final class PackedPoint extends PackedObject {
   int x, y;
}
And the native representation:
// Native representation
struct Point {
   int x, y;
}
There are loads of performance advantages to this, especially for immutable, well defined data structures such as points on a graph or within a 3D model. The JVM will be able to use far more efficient housekeeping mechanisms with respects to Memory Layout, Lookups, JIT and Garbage Collection.

Some sort of Reified generics will also come into the picture, but it's more likely to be an extension of the work done of the 'packed object' idea and removing the need to unnecessary overhead in maintaining and the manipulation of collections that use the Object representation of primitives (e.g. List).

Java 9/10 - ME/SE convergence

It was very clear at JavaOne this year that Oracle are going to try and make Java a player in the IoT space. The opportunuites are endless, but the JVM has some way to go to find the right balance bewtwen being able to work on really small decvices (which is where Java ME and CDLC sit today) and providing a rich language features and a full stack for Java developers (which is where Java SE and EE sit today).

The merger between ME and SE begins in earnest for Java 8 and will hopefully be completed by Java 9. I encourage all of oyu to try out early versions of this on your Raspberry Pi!

But wait there's more!

The way Java itself is being build and maintained is changing itself with a move towards more open development at OpenJDK. Features such as Tail Call Recursion and Co-routines are being discussed and there is a port on the way to have Java natively work on graphics cards.

There's plenty to challenge yourself with and also have a lot of fun in 2014! Me? I'm going to try and get Java running on my Pi and maybe build that AngularJS front end for controlling hardware in my house.....

Happy Holidays everyone!

Martijn (@karianna) Java Champion, Speaker, Author, Cat Herder etc

Meta: this post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

Monday, December 23, 2013

(Part 3 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al

This is a continuation of the previous post titled (Part 2 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al.

In our first review, The Atlassian guide to GC tuning is an extensive post covering the methodology and things to keep in mind when tuning GC, practical examples are given and references to important resources are also made in the process. The next one How NOT to measure latency by Gil Tene, he discusses some common pitfalls encountered in measuring and characterizing latency, demonstrating and discussing some false assumptions and measurement techniques that lead to dramatically incorrect reporting results, and covers simple ways to sanity check and correct these situations.  Finally Kirk Pepperdine in his post Poorly chosen Java HotSpot Garbage Collection Flags and how to fix them! throws light on some JVM flags - he starts with some 700 flags and boils it down to merely 7 flags. Also cautions you to not just draw conclusions or to take action in a whim but consult and examine - i.e. measure don't guess!


Background

By default JVM tunings attempt to provide acceptable performance in majority of the cases, but it may not always give the desired results depending on application behaviour. Benchmark your application before applying any GC or JVM related settings. There are a number of environment, OS, hardware related factors to taken into consideration and many a times any tuning to JVM may be linked to these factors and if they chance the tuning may need to be revisited. Beware and accept the fact that any tuning has its upper limit in a given environment, if your goals are still not met, then you need to improve your environment. Always monitor your application to see if your tuning goals are still in scope.

Choosing Performance Goals - GC for the Oracle JVM can be reduced by targeting three goals ie. latency (mean & maximum), throughput, and footprint. To reach the tuning goals, there are principles that guide you to tuning the GC i.e. Minor GC reclamation, GC maximise memory, two of three (pick 2 of 3 of these goals, and sacrifice the other).

Preparing your environment for GC - 

Some of the items of interest are load the application with work (apply load to the application as it would be in production), turn on GC logging, set data sampling windows, determine memory footprint, rules of thumb for generation sizes, determine systemic requirements.

Iteratively change the JVM parameters keeping the environment its setting in tact. An example of turning on GC logging is:

java -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:<file> …
or
java -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<file> …

Sampling windows are important to help specifically diagnose an application's runtime rather than the load-time or time it or the JVM takes to stabilise before it can execute anything. Set the heap sizes either by guessing (give it maximum and then pull back) or by using results accumulated in the GC logs. Also check for OOME which can be an indicator if heap size or standard memory needs increasing. There are a number of rules of thumb for generation sizes (a number of them depend on sizes of other pools). 

Understanding the Throughput collector and Behaviour based Tuning
Hotspot uses behaviour based tuning with parallel collectors including the MarkAndSweep collector, they are designed to keep three goals in mind i.e. maximum pause time, throughput and footprint. The --XX:+PrintAdaptiveSizePolicy JVM flag prints details of this behaviour for analysis. Even though there are options to set the maximum pause time for any GC event there is no guarantee it will be performed. It also uses the best of 2 (from 3) goals - i.e. only 2 goals are focussed at a time, and there is interaction between goals, and sometimes there is oscillation between them.

Time to tune
There is a cycle to follow when doing this, i.e. 
  1. Determine desired behaviour.
  2. Measure behaviour before change.
  3. Determine indicated change.
  4. Make change.
  5. Measure behaviour after change.
  6. If behaviour not met, try again.
Repeat the above till the results are met (or nearly there).

Tools
Java VisualVM with Visual GC plugin for live telemetry (note there's an observer effect when using VisualVM, which can have an impact on your readings). GCViewer by Tagtruam industries, useful for post-hoc review of GC performance based on the GC log. 

Conclusions: a lot of factors need to be taken into consideration when tuning GC. A meticulous plan needs to be considered, keeping an eye on your goal throughout the process and once achieved, you continue to monitor your application to see if your performance goals are still intact. Adjust your requirements after studying the results of your actions such that they both meet at some point.

--- The authors cover the methodology in good detail and give practical advise and steps to follow in order to tune your application - the hands-on and practical aspects of the blog should definitely be read. ---


The author starts with saying  "some of the mistakes others including the author has made" - plenty of wrong ways to do things and what to avoid doing.

WHY and HOW - you measure latency, HOW only makes sense if you understand the WHY. There is public domain code / tools available that can be used to measure latency.

Don't use statistics to measure latency!

Classic way of looking at latency - response time (latency) is a function of load! Its important to know what this response time - is it average, worst case, 99 percentile, etc... based on different types of behaviours?

Hiccups - some sort of accumulated work that gets performed in bursts (spikes). Important to look at these behaviours when looking at response time and average response time. They are not evenly spread around, but may look like periodic freezes, shifts from one mode/behaviour to another (no sequence).

Common fallacies:
- computers run application code continously
- response time can be measure as a work units over time
- response time exhibits a normal distribution
- "glitches" or "semi-random omissions" in measurement don't have a big effect

Hiccups are hidden knowingly or unknowingly - i.e. averages and standard deviations can help!
Many compute the 99 percentile or another figure from averages and distributions. A better way to deal with it is to measure it using data - for e.g. plot a histogram. You need to have a SLA for the entire spread and plot the distribution across 0 to 100 percentile.

Latency tells us how long something should take! True real-time is being slow and steady!
Author gives examples of types of applications with different latency needs i.e. The Olympics, Pacemaker, Low latency trading, Interactive applications.

A very important way to establishing requirements - an extensive interview process, to help the client come up with a realistic figures for the SLA requirements. We need to find out what is the worst the client is willing to accept and for how long or how often is that acceptable, to help construct a percentile table.

Question is how fast can the system run, smoothly without hitting an obstacles or crashes! The author further makes a comparison of performance between the HotSpot JVM and C4.

Coordinated omission problem - what is it? data that it is not emitted in a random way but in a coordinated way that skews the actual data. Delays in request times does not get measured due to the manner in which response time is recorded from within the system - which omits random uncoordinated data. The maximum value (time) in the data usually is the key value to start with to compute and compare the other percentiles and check if it matches with that computed with coordinated emission in it. Percentiles are useful, compute them.

Graphs with sudden bumps and more smooth lines tend to have coordinated emission in it.

HdrHistogram helps plot histograms - a configurable precision system, telling it the range and precision to cover. It has built in compensation for Coordinated Omission - its OpenSource and available on Github under the CCO license.

jHiccup also plots histogram to show how your computer system has been behaving (run-time and freeze time), also records various percentiles, maximum and minimum values, distribution, good modes, bad modes, etc.... Also opensource under COO license.

Measuring throughput without a latency requirement, the information is meaningless.  Mistakes in measurement/analysis can lead to orders-of-magnitude errors and lead to bad business decisions.


--- The author has shown some cool graphs and illustrations of how we measure things wrong and what to look out for when assessing data or even graphs. Gives a lot of tips, on how NOT to measure latency. ---

Poorly chosen Java HotSpot Garbage Collection Flags and how to fix them! by Kirk Pepperdine

There are more than 700 product flags defined in the HotSpot JVM and not many have a good idea of what effect it might have when used in runtime, let be the combination of two or more together.

Whenever you think you know a flag, it often pays if you check out the source code underlying that flag to get a more accurate idea.

Identifying redundant flags
The author enlists a huge list of flags for us to determine if we need them or not. It see a list of flags available with your version of java, try this command, to get a long list of options:

$ java -XX:+PrintFlagsFinal  

You can narrow down the list by eliminating deprecated flags and ones with a default value, leaving us with a much shorter list.

DisableExplicitGC, ExplicitGCInvokesConcurrent, and RMI Properties
Calling System.gc() isn't a great idea and one way to disable that functionality from code that uses it, is to use the DisableExplicitGC flag as one of the arguments of your VM options flags. Or instead use the ExplicitGCInvokesConcurrent flag to invoke the Concurrent Mark and Sweep cycle instead of the Full GC cycle (invoked by System.gc()). You can also set the interval period between complete GC cycles using the sun.rmi.dgc.client.gcInterval and sun.rmi.dgc.server.gcInterval properties.

Xmx, Xms, NewRatio, SurvivorRatio and TargetSurvivorRatio
The -Xmx flag is used to set the maximum heap size, and all JVMs are meant to support this flag. By adjusting the minimum and maximum heap sizes such that the JVM can't resize the heap size at runtime disables the properties of the adaptive sizing properties.

Use GC logs to help make memory pool size decisions. There is a ratio to maintain between Young Gen and Old Gen and within Young Gen between Eden and the Survivor spaces. The SurvivorRatio flag is one such flag that helps decide how much space will be used by the survivor spaces in Young gen and the rest will be left for Eden. These ratios are key towards determining the frequency and efficiency of the collections in Young Gen. TargetSurvivorRatio is another ratio to examine and cross-check before using.

PermSize, MaxPermSize
These options are removed starting JDK 8 in favour of Meta space, but in previous versions it helped in setting PermGen's resizing abilities. Restricting its adaptive nature might lead to longer pauses of old generational spaces. 

UseConcMarkSweepGC, CMSParallelRemarkEnabled, UseCMSInitiatingOccupancyOnly CMSInitiatingOccupancyFraction
UseConcMarkSweepGC makes the JVM select the CMS collector, CMS works with ParNew   instead of the PSYoung and PSOld collectors for the respective generational spaces. 

CMSInitiatingOccupancyFraction (IOF) is flag that helps the JVM determine how to maintain the occupancy of data (by examining the current rate of promotion) in the Old Gen space, sometimes leading to STW, FullGCs or CMF (concurrent mode failure). 

UseCMSInitiatingOccupancyOnly tells the JVM to use the IOF without taking the rate of promotion into account.

CMSParallelRemarkEnabled helps parallelise the fourth phase of the CMS (single threaded by default), but use this option only after benchmarking your current production systems.

CMSClassUnloadingEnabled
CMSClassUnloadingEnabled ensures that classes loaded in the PermGen space is regularly unloaded reducing situations of out-of-PermGen-space. Although this could lead to longer concurrent and remark (pause) collection times.

AggressiveOpts
AggressiveOpts as the name suggests helps improve performance by switching on and off certain flags but this strategy hasn't change across various builds of Java and one must examine benchmarks of their system before using this flag in production.

Next Steps
Of the number of flags examined the below are more interesting flags to look at:
-ea 
-mx4G 
-XX:+MaxPermSize=300M 
-XX:+UseConcMarkSweepGC 
-XX:+CMSClassUnloadingEnabled 
-XX:SurvivorRatio=4 
-XX:+DisableExplicitGC
Again examine the GC logs to see if any of the above should be used or not, if used, what values are appropriate to set. Quite often the JVM does get it right, right out of the box, hence getting it wrong can be detrimental to your application performance. Its easy to mess up then to get it right, when you have so many flags with dependent and overlapping functionalities.




Conclusion:  Refer to benchmarks and GC logs for making decisions on enabling and disabling of flags and the values to set when enabled. Lots of flags, easy to get them wrong, only use them if they are absolutely needed, remember the Hotspot JVM has built in machinery to adapt (adaptive policy), consult an expert if you still want to.



--- The author has done justice by giving us a gist of what the large number of JVM flags and recommend reading the article to learn about specifics of certain flags.. ---

As it is not practical to review all such videos and articles, a number of them have been provided in the links below for further study. In many cases I have paraphrased or directly quoted what the authors have to say to preserve the message and meaning they wished to convey. 

A few other authors have written articles related to these subjects, for the Java Advent Calendar, see below,
Using Intel Performance Counters To Tune Garbage Collection
How NOT to measure latency

Feel free to post your comments below or tweet at @theNeomatrix369!

    Useful resources

      This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!