Our authors – Markus Eisele

Markus Eisele is a Developer Advocate at Red Hat and focuses on JBoss Middleware. He is working with Java EE servers from different vendors since more than 14 years and talks about his favorite topics around Java EE on conferences all over the world. He has been a principal consultant and worked with different customers on all kinds of Java EE related applications and solutions. Beside that he has always been a prolific blogger, writer and tech editor for different Java EE related books. He is an active member of the German DOAG e.V. and it’s representative on the iJUG e.V. As a Java Champion and former ACE Director he is well known in the community.

A persistent KeyValue Server in 40 lines and a sad fact

Advent time again .. picking up Peters well written overview on the uses of Unsafe, i’ll have a short fly-by on how low level techniques in Java can save development effort by enabling a higher level of abstraction or allow for Java performance levels probably unknown to many.

My major point is to show that conversion of Objects to bytes and vice versa is an important fundamental, affecting virtually any modern java application.

Hardware enjoys to process streams of bytes, not object graphs connected by pointers as “All memory is tape” (M.Thompson if I remember correctly ..).

Many basic technologies are therefore hard to use with vanilla Java heap objects:

  • Memory Mapped Files – a great and simple technology to persist application data safe, fast & easy.
  • Network communication is based on sending packets of bytes
  • Interprocess communication (shared memory)
  • Large main memory of today’s servers (64GB to 256GB). (GC issues)
  • CPU caches work best on data stored as a continuous stream of bytes in memory

so use of the Unsafe class in most cases boil down in helping to transform a java object graph into a continuous memory region and vice versa either using

  • [performance enhanced] object serialization or
  • wrapper classes to ease access to data stored in a continuous memory region.

(source of examples used in this post can be found here, messaging latency test here)


    Serialization based Off-Heap

    Consider a retail WebApplication where there might be millions of registered users. We are actually not interested in representing data in a relational database as all needed is a quick retrieve of user related data once he logs in. Additionally one would like to traverse the social graph quickly.

    Let’s take a simple user class holding some attributes and a list of ‘friends’ making up a social graph.

    easiest way to store this on heap, is a simple huge HashMap.

    Alternatively one can use off heap maps to store large amounts of data. An off heap map stores its keys and values inside the native heap, so garbage collection does not need to track this memory. In addition, native heap can be told to automagically get synchronized to disk (memory mapped files). This even works in case your application crashes, as the OS manages write back of changed memory regions.

    There are some open source off heap map implementations out there with various feature sets (e.g. ChronicleMap), for this example I’ll use a plain and simple implementation featuring fast iteration (optional full scan search) and ease of use.

    Serialization is used to store objects, deserialization is used in order to pull them to the java heap again. Pleasantly I have written the (afaik) fastest fully JDK compliant object serialization on the planet, so I’ll make use of that.

     Done:

    • persistence by memory mapping a file (map will reload upon creation). 
    • Java Heap still empty to serve real application processing with Full GC < 100ms. 
    • Significantly less overall memory consumption. A user record serialized is ~60 bytes, so in theory 300 million records fit into 180GB of server memory. No need to raise the big data flag and run 4096 hadoop nodes on AWS ;).
    Comparing a regular in-memory java HashMap and a fast-serialization based persistent off heap map holding 15 millions user records, will show following results (on a 3Ghz older XEON 2×6):

    consumed Java Heap (MB) Full GC (s) Native Heap (MB) get/put ops per s required VM size (MB)
    HashMap 6.865,00 26,039 0 3.800.000,00
    12.000,00
    OffheapMap (Serialization based)
    63,00
    0,026
    3.050
    750.000,00
    500,00

    [test source / blog project] Note: You’ll need at least 16GB of RAM to execute them.

    As one can see, even with fast serialization there is a heavy penalty (~factor 5) in access performance, anyway: compared to other persistence alternatives, its still superior (1-3 microseconds per “get” operation, “put()” very similar).

    Use of JDK serialization would perform at least 5 to 10 times slower (direct comparison below) and therefore render this approach useless.

    Trading performance gains against higher level of abstraction: “Serverize me”

    A single server won’t be able to serve (hundreds of) thousands of users, so we somehow need to share data amongst processes, even better: across machines.

    Using a fast implementation, its possible to generously use (fast-) serialization for over-the-network messaging. Again: if this would run like 5 to 10 times slower, it just wouldn’t be viable. Alternative approaches require an order of magnitude more work to achieve similar results.

    By wrapping the persistent off heap hash map by an Actor implementation (async ftw!), some lines of code make up a persistent KeyValue server with a TCP-based and a HTTP interface (uses kontraktor actors). Of course the Actor can still be used in-process if one decides so later on.

    Now that’s a micro service. Given it lacks any attempt of optimization and is single threaded, its reasonably fast [same XEON machine as above]:

    • 280_000 successful remote lookups per second 
    • 800_000 in case of fail lookups (key not found)
    • serialization based TCP interface (1 liner)
    • a stringy webservice for the REST-of-us (1 liner).

    [source: KVServer, KVClient] Note: You’ll need at least 16GB of RAM to execute the test.

    A real world implementation might want to double performance by directly putting received serialized object byte[] into the map instead of encoding it twice (encode/decode once for transmission over wire, then decode/encode for offheaping map).

    “RestActorServer.Publish(..);” is a one liner to also expose the KVActor as a webservice in addition to raw tcp:

    C like performance using flyweight wrappers / structs

    With serialization, regular Java Objects are transformed to a byte sequence. One can do the opposite: Create  wrapper classes which read data from fixed or computed positions of an underlying byte array or native memory address. (E.g. see this blog post).

    By moving the base pointer its possible to access different records by just moving the the wrapper’s offset. Copying such a “packed object” boils down to a memory copy. In addition, its pretty easy to write allocation free code this way. One downside is, that reading/writing single fields has a performance penalty compared to regular Java Objects. This can be made up for by using the Unsafe class.

    “flyweight” wrapper classes can be implemented manually as shown in the blog post cited, however as code grows this starts getting unmaintainable.
    Fast-serializaton provides a byproduct “struct emulation” supporting creation of flyweight wrapper classes from regular Java classes at runtime. Low level byte fiddling in application code can be avoided for the most part this way.

    How a regular Java class can be mapped to flat memory (fst-structs):

    Of course there are simpler tools out there to help reduce manual programming of encoding  (e.g. Slab) which might be more appropriate for many cases and use less “magic”.

    What kind of performance can be expected using the different approaches (sad fact incoming) ?

    Lets take the following struct-class consisting of a price update and an embedded struct denoting a tradable instrument (e.g. stock) and encode it using various methods:

    a ‘struct’ in code
    Pure encoding performance:
    Structs fast-Ser (no shared refs) fast-Ser JDK Ser (no shared) JDK Ser
    26.315.000,00 7.757.000,00 5.102.000,00 649.000,00 644.000,00




    Real world test with messaging throughput:

    In order to get a basic estimation of differences in a real application, i do an experiment how different encodings perform when used to send and receive messages at a high rate via reliable UDP messaging:

    The Test:
    A sender encodes messages as fast as possible and publishes them using reliable multicast, a subscriber receives and decodes them.

    Structs fast-Ser (no shared refs) fast-Ser JDK Ser (no shared) JDK Ser
    6.644.107,00 4.385.118,00 3.615.584,00 81.582,00 79.073,00

    (Tests done on I7/Win8, XEON/Linux scores slightly higher, msg size ~70 bytes for structs, ~60 bytes serialization).

    Slowest compared to fastest: factor of 82. The test highlights an issue not covered by micro-benchmarking: Encoding and Decoding should perform similar, as factual throughput is determined by Min(Encoding performance, Decoding performance). For unknown reasons JDK serialization manages to encode the message tested like 500_000 times per second, decoding performance is only 80_000 per second so in the test the receiver gets dropped quickly:



    ***** Stats for receive rate:   80351   per second *********
    ***** Stats for receive rate:   78769   per second *********
    SUB-ud4q has been dropped by PUB-9afs on service 1
    fatal, could not keep up. exiting

    (Creating backpressure here probably isn’t the right way to address the issue 😉  )

    Conclusion:

    • a fast serialization allows for a level of abstraction in distributed applications impossible if serialization implementation is either
      – too slow
      – incomplete. E.g. cannot handle any serializable object graph
      – requires manual coding/adaptions. (would put many restrictions on actor message types, Futures, Spore’s, Maintenance nightmare)
    • Low Level utilities like Unsafe enable different representations of data resulting in extraordinary throughput or guaranteed latency boundaries (allocation free main path) for particular workloads. These are impossible to achieve by a large margin with JDK’s public tool set.
    • In distributed systems, communication performance is of fundamental importance. Removing Unsafe is  not the biggest fish to fry looking at the numbers above .. JSON or XML won’t fix this ;-).
    • While the HotSpot VM has reached an extraordinary level of performance and reliability, CPU is wasted in some parts of the JDK like there’s no tomorrow. Given we are living in the age of distributed applications and data, moving stuff over the wire should be easy to achieve (not manually coded) and as fast as possible. 
    Addendum: bounded latency

    A quick Ping Pong RTT latency benchmark showing that java can compete with C solutions easily, as long the main path is allocation free and techniques like described above are employed:

    [credits: charts+measurement done with HdrHistogram]

    This is an “experiment” rather than a benchmark (so do not read: ‘Proven: Java faster than C’), it shows low-level-Java can compete with C in at least this low-level domain.
    Of course its not exactly idiomatic Java code, however its still easier to handle, port and maintain compared to a JNI or pure C(++) solution. Low latency C(++) code won’t be that idiomatic either 😉

    About me: I am a solution architect freelancing at an exchange company in the area of realtime GUIs, middleware, and low latency CEP (Complex Event Processing).
    I am blogging at http://java-is-the-new-c.blogspot.de/,
    hacking at https://github.com/RuedigerMoeller.

    This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

    How is Java / JVM built ? Adopt OpenJDK is your answer!

    Introduction & history
    As some of you may already know, starting with Java 7, OpenJDK is the Reference Implementation (RI) to Java. The below time line gives you an idea about the history of OpenJDK:
    OpenJDK history (2006 till date)
    If you have wondered about the JDK or JRE binaries that you download from vendors like Oracle, Red Hat, etcetera, then the clue is that these all stem from OpenJDK. Each vendor then adds some extra artefacts that are not open source yet due to security, proprietary or other reasons.


    What is OpenJDK made of ?
    OpenJDK is made up of a number of repositories, namely corba, hotspot, jaxp, jaxws, jdk, langtools, and nashorn. Between OpenjJDK8 and OpenJDK9 there have been no new repositories introduced, but lots of new changes and restructuring, primarily due to Jigsaw – the modularisation of Java itself [2] [3] [4] [5].
    repo composition, language breakdown (metrics are estimated)
    Recent history
    OpenJDK Build Benchmarks – build-infra (Nov 2011) by Fredrik Öhrström, ex-Oracle, OpenJDK hero!

    Fredrik Öhrström visited the LJC [16] in November 2011 where he showed us how to build OpenJDK on the three major platforms, and also distributed a four page leaflet with the benchmarks of the various components and how long they took to build. The new build system and the new makefiles are a result  of the build system being re-written (build-infra). 


    Below are screen-shots of the leaflets, a good reference to compare our journey:

    Build Benchmarks page 2 [26]

    How has Java the language and platform built over the years ?

    Java is built by bootstrapping an older (previous) version of Java – i.e. Java is built using Java itself as its building block. Where older components are put together to create a new component which in the next phase becomes the building block. A good example of bootstrapping can be found at Scheme from Scratch [6] or even on Wikipedia [7].


    OpenJDK8 [8] is compiled and built using JDK7, similarly OpenJDK9 [9] is compiled and built using JDK8. In theory OpenJDK8 can be compiled using the images created from OpenJDK8, similarly for OpenJDK9 using OpenJDK9. Using a process called bootcycle images – a JDK image of OpenJDK is created and then using the same image, OpenJDK is compiled again, which can be accomplished using a make command option:


    $ make bootcycle-images       # Build images twice, second time with newly built JDK


    make offers a number of options under OpenJDK8 and OpenJDK9, you can build individual components or modules by naming them, i.e.


    $ make [component-name] | [module-name]


    or even run multiple build processes in parallel, i.e.


    $ make JOBS=<n>                 # Run <n> parallel make jobs


    Finally install the built artefact using the install option, i.e.


    $ make install


    Some myths busted
    OpenJDK or Hotspot to be more specific isn’t completely written in C/C++, a good part of the code-base is good ‘ole Java (see the composition figure above). So you don’t have to be a hard-core developer to contribute to OpenJDK. Even the underlying C/C++ code code-base isn’t scary or daunting to look at. For example here is an extract of a code snippet from vm/memory/universe.cpp in the HotSpot repo –
    .
    .
    .
    Universe::initialize_heap()

    if (UseParallelGC) {
       #ifndef SERIALGC
       Universe::_collectedHeap = new ParallelScavengeHeap();
       #else // SERIALGC
           fatal(UseParallelGC not supported in this VM.);
       #endif // SERIALGC

    } else if (UseG1GC) {
       #ifndef SERIALGC
       G1CollectorPolicy* g1p = new G1CollectorPolicy();
       G1CollectedHeap* g1h = new G1CollectedHeap(g1p);
       Universe::_collectedHeap = g1h;
       #else // SERIALGC
           fatal(UseG1GC not supported in java kernel vm.);
       #endif // SERIALGC

    } else {
       GenCollectorPolicy* gc_policy;

       if (UseSerialGC) {
           gc_policy = new MarkSweepPolicy();
       } else if (UseConcMarkSweepGC) {
           #ifndef SERIALGC
           if (UseAdaptiveSizePolicy) {
               gc_policy = new ASConcurrentMarkSweepPolicy();
           } else {
               gc_policy = new ConcurrentMarkSweepPolicy();
           }
           #else // SERIALGC
               fatal(UseConcMarkSweepGC not supported in this VM.);
           #endif // SERIALGC
       } else { // default old generation
           gc_policy = new MarkSweepPolicy();
       }

       Universe::_collectedHeap = new GenCollectedHeap(gc_policy);
    }
    .
    .
    .
    (please note that the above code snippet might have changed since published here)
    The things that appears clear from the above code-block are, we are looking at how pre-compiler notations are used to create Hotspot code that supports a certain type of GC i.e. Serial GC or Parallel GC. Also the type of GC policy is selected in the above code-block when one or more GC switches are toggled i.e. UseAdaptiveSizePolicy when enabled selects the Asynchronous Concurrent Mark and Sweep policy. In case of either Use Serial GC or Use Concurrent Mark Sweep GC are not selected, then the GC policy selected is Mark and Sweep policy. All of this and more is pretty clearly readable and verbose, and not just nicely formatted code that reads like English.


    Further commentary can be found in the section called Deep dive Hotspot stuff in the Adopt OpenJDK Intermediate & Advance experiences [12] document.


    Steps to build your own JDK or JRE
    Earlier we mentioned about JDK and JRE images – these are no longer only available to the big players in the Java world, you and I can build such images very easily. The steps for the process have been simplified, and for a quick start see the Adopt OpenJDK Getting Started Kit [11] and Adopt OpenJDK Intermediate & Advance experiences [12] documents. For detailed version of the same steps, please see the Adopt OpenJDK home page [13]. Basically building a JDK image from the OpenJDK code-base boils down to the below commands:


    (setup steps have been made brief and some commands omitted, see links above for exact steps)

    $ hg clone http://hg.openjdk.java.net/jdk8/jdk8 jdk8  (a)…OpenJDK8
    or
    $ hg clone http://hg.openjdk.java.net/jdk9/jdk9 jdk9  (a)…OpenJDK9

    $ ./get_source.sh                                    (b)
    $ bash configure                                      (c)
    $ make clean images                                   (d)

    (setup steps have been made brief and some commands omitted, see links above for exact steps)

    To explain what is happening at each of the steps above:
    (a) We clone the openjdk mercurial repo just like we would using git clone ….
    (b) Once we have step (a) completed, we change into the folder created, and run the get_source.sh command, which is equivalent to a git fetch or a git pull, since the step (a) only brings down base files and not all of the files and folders.
    (c) Here we run a script that checks for and creates the configuration needed to do the compile and build process
    (d) Once step (c) is success we perform a complete compile, build and create JDK and JRE images from the built artefacts


    As you can see these are dead-easy steps to follow to build an artefact or JDK/JRE images [step (a) needs to be run only once].


    Benefits
    – contribute to the evolution and improvement of the Java the language & platform
    – learn about the internals of the language and platform
    – learn about the OS platform and other technologies whilst doing the above
    – get involved in F/OSS projects
    – stay on top the latest changes in the Java / JVM sphere
    – knowledge and experience that helps professionally but also these are not readily available from other sources (i.e. books, training, work-experience, university courses, etcetera).
    – advancement in career
    – personal development (soft skills and networking)


    Contribute
    Join the Adopt OpenJDK [13] and Betterrev [15] projects and contribute by giving us feedback about everything Java including these projects. Join the Adoption Discuss mailing list [14] and other OpenJDK related mailing lists to start with, these will keep you updated with latest progress and changes to OpenJDK. Fork any of the projects you see and submit changes via pull-requests.


    Thanks and support
    Adopt OpenJDK [13] and umbrella projects have been supported and progressed with help of JCP [21], the Openjdk team [22], JUGs like London Java Community [16], SouJava [17] and other JUGs in Brazil, a number of JUGs in Europe i.e. BGJUG (Bulgarian JUG) [18], BeJUG (Belgium JUG) [19], Macedonian JUG [20], and a number of other small JUGs. We hope in the coming time more JUGs and individuals would get involved. If you or your JUG wish to participate please get in touch.

    Credits
    Special thanks to +Martijn Verburg (incepted Adopt OpenJDK), +Richard Warburton+Oleg Shelajev+Mite Mitreski, +Kaushik Chaubal and +Julius G for helping improve the content and quality of this post, and sharing their OpenJDK experience with us.


    How to get started ?
    Join the Adoption Discuss mailing list [14], go to the Adopt OpenJDK home page [13] to get started, followed by referring to the Adopt OpenJDK Getting Started Kit [11] and Adopt OpenJDK Intermediate & Advance experiences [12] documents.


    Please share your comments here or tweet at @theNeomatrix369.


    Resources
    [8] OpenJDK8 
    [17] SouJava 


    This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

    Managing Package Dependencies with Degraph

    A large part of the art of software development is keeping the complexity of a system as low as possible. But what is complexity anyway? While the exact semantics vary quite a bit, depending on who you ask, probably most agree that it has a lot to do with the number of parts in a system and their interactions.

    Consider a marble in space, i.e a planet, moon or star. Without any interaction this is as boring as a system can get. Nothing happens. If the marble moves, it keeps moving in exactly the same way. To be honest there isn’t even a way to determine if it is moving. Boooring.

    Add a second marble to the system and let them attract each other, like earth and moon. Now the system is a more interesting. The two objects circle each other if they aren’t too fast. Somewhat interesting.

    Now add a third object. In the general case things go so interesting that we can’t even predict what is going to happen. The whole system didn’t just became complex it became chaotic. You now have a three body problem In the general case this problem cannot be solved, i.e. we cannot predict what will happen with the system. But there are some special cases. Especially the case where two of the objects a very close to each other like earth and moon and the third one is so far away that the two first object behave just like one. In this case you approximate the system with two particle systems.

    But what has this to do with Java? This sounds more like physics.

    I think software development is similar in some aspects. A complete application is way to complicated to be understood as a whole. To fight this complexity we divide the system into parts (classes) that can be understood on their own, and that hide their inner complexity so that when we look at the larger picture we don’t have to worry about every single code line in a class, but only about the class as one entity. This is actually very similar to what physicists do with systems.

    But let’s look at the scale of things. The basic building block of software is the code line. And to keep the complexity in check we bundle code lines that work together in methods. How many code lines go into a single method varies, but it is in the order of 10 lines of code.
    Next you gather methods into classes. How many methods go into a single class? Typically in the order of 10 methods!

    And then? We bundle 100-10000 classes in a single jar! I hope I’m not the only one who thinks something is amiss.

    I’m not sure what comes out of project jigsaw, but currently Java only offers packages as a way to bundle classes. Package aren’t a powerful abstraction, yet it is the only one we have, so we better use it.

    Most teams do use packages, but not in a very well structured, but ad hoc way. The result is similar to trying to consider moon and sun as on part of the system, and the earth as the other part. The result might work, but it is probably as intuitive as Ptolemy’s planetary model. Instead decide on criteria how you want to differentiate your packages. I personally call them slicings, inspired by an article by Oliver Gierke. Possible slicings in order of importance are:

    • the deployable jar file the class should end up in
    • the use case / feature / part of the business model the class belongs to
    • the technical layer the class belongs to

    The packages this results in will look like this: <domain>.<deployable>.<domain part>.<layer>

    It should be easy to decide where a class goes. And it should also keep the packages at a reasonable size, even when you don’t use the separation by technical layer.

    But what do you gain from this? It is easier to find classes, but that’s about it. You need one more rule to make this really worth while: There must be no cyclic dependencies!

    This means, if a class in a package A references a class in package B no class in B may reference A. This also applies if the reference is indirect via multiple other packages. But that is still not enough. Slices should be cycle free as well, so if a domain part X references a different domain part Y, the reverse dependency must not exist!

    This will in deed put some rather strict rules on your package and dependency structure. The benefit of this is, that it becomes very flexible.

    Without such a structure splitting your project in multiple parts will probably be rather difficult. Ever tried to reuse part of an application in a different one, just to realize that you basically have to include most of the the application in order to get it to compile? Ever tried to deploy different parts of an application to different servers, just to realize you can’t? It certainly happend to me before I used the approach mentioned above. But with this more strict structure, the parts you may want to reuse, will almost on their own end up on the end of the dependency chain so you can take them and bundle them in their own jar, or just copy the code in a different project and have it compile in very short time.

    Also while trying to keep your packages and slices cycle free you’ll be forced to think hard, what each package involved is really about. Something that improved my code base considerably in many cases.

    So there is one problem left: Dependencies are hard to see. Without a tool, it is very difficult to keep a code base cycle free. Of course there are plenty of tools that check for cycles, but cleaning up these cycles is tough and the way most tools present these cycles doesn’t help very much. I think what one needs are two things:

    1. a simple test, that can run with all your other tests and fails when you create a dependency circle.
    2. a tool that visualizes all the dependencies between classes, while at the same time showing in which slice each class belongs.

    Surprise! I can recommend such a great tool: Degraph! (I’m the author, so I might be biased)

    You can write tests in JUnit like this:



    assertThat(
    classpath().including("de.schauderhaft.**")
    .printTo("degraphTestResult.graphml")
    .withSlicing("module", "de.schauderhaft.(*).*.**")
    .withSlicing("layer", "de.schauderhaft.*.(*).**"),
    is(violationFree())
    );

    The test will analyze everything in the classpath that starts with de.schauderhaft. It will slice the classes in two ways: By taking the third part of the package name and by taking the forth part of the package name. So a class name de.schauderhaft.customer.persistence.HibernateCustomerRepository ends up in the module customer and in the layer persistence. And it will make sure that modules, layers and packages are cycle free.

    And if it finds a dependency circle, it will create a graphml file, which you can open using the free graph editor yed. With a little layouting you get results like the following where the dependencies that result in circular dependencies are marked in red.

    Again for more details on how to achieve good usable layouts I have to refer to the documentation of Degraph.

    Also note that the graphs are colored mainly green with a little red, which nicely fits the season!

    This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

    CMS Pipelines … for NetRexx on the JVM

    This year I want to tell you about a new and exciting addition to NetRexx (which, incidentally just turned 19 years old the day before yesterday). NetRexx, as some of you know, is the first alternative language for the JVM, stems from IBM, and is free and open source since 2011 (http://www.netrexx.org). It is a happy marriage of the Rexx Language (Michael Cowlishaw, IBM, 1979) and the JVM. NetRexx can run compiled ahead of time, as .class files for maximum performance, or interpreted, for a quick development cycle, or very dynamic production of code. After the addition of Scripting in version 3.03 last year, the new release (3.04, somewhere at the end of 2014) include Pipes.

    We know what pipes are, I hear you say, but what are Pipes? A Pipeline, also called a Hartmann Pipeline, is a concept that extends and improves pipes as they are known from Unix and other operating systems. The name pipe indicates an inter- process communication mechanism, as well as the programming paradigm it has introduced. Compared to Unix pipes, Hartmann Pipelines offer multiple input- and output streams, more complex pipe topologies, and a lot more, too much for this short article but worthy of your study.

    Pipelines were first implemented on VM/CMS, one of IBM’s mainframe operating systems. This version was later ported to TSO to run under MVS and has been part of several product configurations. Pipelines are widely used by VM users, in a symbiotic relationship with REXX, the interpreted language that also has its origins on this platform. Pipes in the NetRexx version are compile by a special Pipes Compiler that has been integrated with NetRexx. The resulting code can run on every platform that has a JVM (Java Virtual Machine), including z/VM and z/OS for that matter. This portable version of Pipelines was started by Ed Tomlinson in 1997 under the name of njpipes, when NetRexx was still very new, and was open sourced in 2011, soon after the NetRexx translator itself. It was integrated into the NetRexx translator in 2014 and will be released integrated in the NetRexx distribution for the first time with version 3.04. It answers the eternal question posed to the development team by every z/VM programmer we ever met: “But … Does It Have Pipes?” It also marks the first time that a non-charge Pipelines product runs on z/OS. But of course most of you will be running Linux, Windows or OSX, where NetRexx and Pipes also run splendidly.

    NetRexx users are very cautious of code size and peformance – for example because applications also run on limited JVM specifications as JavaME, in Lego Robots and on Androids and Raspberries, and generally are proud and protective of the NetRexx runtime, which weighs in at 37K (yes, 37 kilobytes, it even shrunk a few bytes over the years). For this reason, the Pipes Compiler and the Stages are packaged in the NetRexxF.jar – F is for Full, and this jar also includes the eclipse Java compiler which makes NetRexx a standalone package that only needs a JRE for development. There is a NetRexxC.jar for those who have a working Java SDK and only want to compile NetRexx. So we have NetRexxR.jar at 37K, NetRexxC.jar at 322K, and the full NetRexx kaboodle in 2.8MB – still small compared to some other JVM Languages.

    The pipeline terminology is a metaphore derived from plumbing. Fitting two or more pipe segments together yield a pipeline. Water flows in one direction through the pipeline. There is a source, which could be a well or a water tower; water is pumped through the pipe into the first segment, then through the other segments until it reaches a tap, and most of it will end up in the sink. A pipeline can be increased in length with more segments of pipe, and this illustrates the modular concept of the pipeline. When we discuss pipelines in relation to computing we have the same basic structure, but instead of water that passes through the pipeline, data is passed through a series of programs (stages) that act as filters. Data must come from some place and go to some place. Analogous to the well or the water tower there are device drivers that act as a source of the data, where the tap or the sink represents the place the data is going to, for example to some output device as your terminal window or a file on disk, or a network destination. Just as water, data in a pipeline flows in one direction, by convention from the left to the right.

    A program that runs in a pipeline is called a stage. A program can run in more than one place in a pipeline – these occurrences function independent of each other. The pipeline specification is processed by the pipeline compiler, and it must be contained in a character string; on the commandline, it needs to be between quotes, while when contained in a file, it needs to be between the delimiters of a NetRexx string. An exclamation mark (!) is used as stage separator, while the solid vertical bar | can be used as an option when specifiying the local option for the pipe, after the pipe name. When looking a two adjaced segments in a pipeline, we call the left stage the producer and the stage on the right the consumer, with the stage separator as the connector.

    A device driver reads from a device (for instance a file, the command prompt, a machine console or a network connection) or writes to a device; in some cases it can both read and write. An example of a device drivers are diskr for diskread and diskw for diskwrite; these read and write data from and to files. A pipeline can take data from one input device and write it to a different device. Within the pipeline, data can be modified in almost any way imaginable by the programmer. The simplest process for the pipeline is to read data from the input side and copy it unmodified to the output side. The pipeline compiler connects these programs; it uses one program for each device and connects them together. All pipeline segments run on their own thread and are scheduled by the pipeline scheduler. The inherent characteristic of the pipeline is that any program can be connected to any other program because each obtains data and sends data throug a device independent standard interface. The pipeline usually processes one record (or line) at a time. The pipeline reads a record for the input, processes it and sends it to the output. It continues until the input source is drained.

    Until now everything was just theory, but now we are going to show how to compile and run a pipeline. The executable script pipe is included in the NetRexx distribution to specify a pipeline and to compile NetRexx source that contains pipelines. Pipelines can be specified on the command line or in a file, but will always be compiled to a .class file for execution in the JVM.

     pipe ”(hello) literal ”hello world” ! console”

    This specifies a pipeline consisting of a source stage literal that puts a string (“hello world”) into the pipeline, and a console sink, that puts the string on the screen. The pipe compiler will echo the source of the pipe to the screen – or issue messages when something was mistyped. The name of the classfile is the name of the pipe, here specified between parentheses. Options also go there. We call execute the pipe by typing:

    java hello

    Now we have shown the obligatory example, we can make it more interesting by adding a reverse stage in between:

    pipe ”(hello) literal ”hello world” ! reverse ! console

    When this is executed, it dutifully types “dlrow olleh”. If we replace the string after literal with arg(), we then can start the hello pipeline with a an argument to reverse: and we run it with:

    java hello a man a plan a canal panama

    and it will respond:

    amanap lanac a nalp a nam a

    which goes to show that without ignoring space no palindrome is very convincing – which we can remedy with a change to the pipeline: use the change stage to take out the spaces:

    pipe”(hello) literal arg() ! change /” ”// ! console”

    Now for the interesting parts. Whole pipeline topologies can be added, webservers can be built, relational databases (all with a jdbc driver) can be queried. For people that are familiar with the z/VM CMS Pipelines product, most of its reference manual is relevant for this implementation. We are working on the new documentation to go with NetRexx 3.04.

    Pipes for NetRexx are the work of Ed Tomlinson, Jeff Hennick, with contributions by Chuck Moore, myself, and others. Pipes were the first occasion I have laid eyes on NetRexx, and I am happy they now have found their place in the NetRexx open source distribution. To have a look at it, download the NetRexx source from the Kenai site (https://kenai.com/projects/netrexx ) and build a 3.04 version yourself. Alternatively, wait until the 3.04 package hits http://www.netrexx.org.
    This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

    How and Why is Unsafe used in Java?

    Overview

    sun.misc.Unsafe has been in Java from at least as far back as Java 1.4 (2004).  In Java 9, Unsafe will be hidden along with many other, for-internal-use classes. to improve the maintainability of the JVM.  While it is still unclear exactly what will replace Unsafe, and I suspect it will be more than one thing which replaces it, it raises the question, why is it used at all?

    Doing things which the Java language doesn’t allow but are still useful.

    Java doesn’t allow many of the tricks which are available to lower level languages.  For most developers this is very good thing, and it not only saves you from yourself, it also saves you from your co-workers.  It also makes it easier to import open source code because you know there is limits to how much damage they can do.  Or at least there is limits to how much you can do accidentally. If you try hard enough you can still do damage.

    But why would you even try, you might wonder?  When building libraries many (but not all) of the methods in Unsafe are useful and in some cases, there is no other way to do the same thing without using JNI, which is even more dangerous and you lose the “compile once, run anywhere”

    Deserialization of objects

    When deserializing or building an object using a framework, you make the assumption you want to reconstitute an object which existed before.  You expect that you will use reflection to either call the setters of the class, or more likely set the internal fields directly, even the final fields.  The problem is you want to create an instance of an object, but you don’t really need a constructor as this is likely to only make things more difficult and have side effects.
    public class A implements Serializable {
    private final int num;
    public A(int num) {
    System.out.println("Hello Mum");
    this.num = num;
    }

    public int getNum() {
    return num;
    }
    }

    In this class, you should be able to rebuild and set the final field, but if you have to call a constructor and it might do things which don’t have anything to do with deserialization.  For these reasons many libraries use Unsafe to create instances without calling a constructor

    Unsafe unsafe = getUnsafe();
    Class aClass = A.class;
    A a = (A) unsafe.allocateInstance(aClass);

    Calling allocateInstance avoids the need to call the appropriate constructor, when we don’t need one.

    Thread safe access to direct memory

    Another use for Unsafe is thread safe access to off heap memory.  ByteBuffer gives you safe access to off heap or direct memory, however it doesn’t have any thread safe operations.  This is particularly useful if you want to share data between processes.

    import sun.misc.Unsafe;
    import sun.nio.ch.DirectBuffer;

    import java.io.File;
    import java.io.IOException;
    import java.io.RandomAccessFile;
    import java.lang.reflect.Field;
    import java.nio.MappedByteBuffer;
    import java.nio.channels.FileChannel;

    public class PingPongMapMain {
    public static void main(String... args) throws IOException {
    boolean odd;
    switch (args.length < 1 ? "usage" : args[0].toLowerCase()) {
    case "odd":
    odd = true;
    break;
    case "even":
    odd = false;
    break;
    default:
    System.err.println("Usage: java PingPongMain [odd|even]");
    return; }
    int runs = 10000000;
    long start = 0;
    System.out.println("Waiting for the other odd/even");
    File counters = new File(System.getProperty("java.io.tmpdir"), "counters.deleteme"); counters.deleteOnExit();

    try (FileChannel fc = new RandomAccessFile(counters, "rw").getChannel()) {
    MappedByteBuffer mbb = fc.map(FileChannel.MapMode.READ_WRITE, 0, 1024);
    long address = ((DirectBuffer) mbb).address();
    for (int i = -1; i < runs; i++) {
    for (; ; ) {
    long value = UNSAFE.getLongVolatile(null, address);
    boolean isOdd = (value & 1) != 0;
    if (isOdd != odd)
    // wait for the other side.
    continue;
    // make the change atomic, just in case there is more than one odd/even process
    if (UNSAFE.compareAndSwapLong(null, address, value, value + 1))
    break;
    }
    if (i == 0) {
    System.out.println("Started");
    start = System.nanoTime();
    }
    }
    }
    System.out.printf("... Finished, average ping/pong took %,d ns%n",
    (System.nanoTime() - start) / runs);
    }

    static final Unsafe UNSAFE;

    static {
    try {
    Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
    theUnsafe.setAccessible(true);
    UNSAFE = (Unsafe) theUnsafe.get(null);
    } catch (Exception e) {
    throw new AssertionError(e);
    }
    }
    }

    When you run this in two programs, one with odd and the other with even. You can see that each process is changing data via  persisted shared memory.

    In each program it maps the same are of the disks cache into the process.  There is actually only one copy of the file in memory.  This means the memory can be shared, provided you use thread safe operations such as the volatile and CAS operations.

    The output on an i7-3970X is

    Waiting for the other odd/even
    Started
    … Finished, average ping/pong took 83 ns

    That is 83 ns round trip time between two processes. When you consider System V IPC takes around 2,500 ns and IPC volatile instead of persisted, that is pretty quick.

    Is using Unsafe suitable for work?

    I wouldn’t recommend you use Unsafe directly.  It requires far more testing than natural Java development.  For this reason I suggest you use a library where it’s usage has been tested already.  If you wan to use Unsafe yourself, I suggest you thoughly test it’s usage in a stand alone library.  This limits how Unsafe is used in your application and give syou a safer, Unsafe.

    Conclusion

    It is interesting that Unsafe exists in Java, and you might to play with it at home.  It has some work applications especially in writing low level libraries, but in general it is better to use a library which uses Unsafe which has been tested than use it directly yourself.

    About the Author.

    Peter Lawrey has the most Java answers on StackOverflow. He is the founder of the Performance Java User’s Group, and lead developer of Chronicle Queue and Chronicle Map, two libraries which use Unsafe to share persisted data between processes.

    Developers want to be heard

    How often have you been in this situation?

    You’re in a meeting with the team and you’re all discussing the implementation of a new feature. The group seems to be converging on a design, but there’s something about it that feels off, some sort of “smell”.  You point this out to the team, perhaps outlining the specific areas that make you uncomfortable. Maybe you even have an alternative solution. The team lets you have your say, but assures you their solution is The Way.

    Or what about this?

    A tech lead asks you to fix a bug, and as you work on your implementation you bounce ideas around periodically just to make sure you’re on the right track. Things seem to be OK, until it comes to getting your code merged. Now it becomes clear that your implementation is not what the lead had in mind, and there’s a frustrating process of back-and-forth while you explain and defend your design decisions whilst trying to incorporate the feedback. At the end, the solution doesn’t feel like your work, and you’re not entirely sure what was wrong with your initial implementation – it fixed the problem, passed the tests, and met the criteria you personally think are important (readability / scalability / performance / stability / time-to-implementation, whatever it is that you value).

    When you speak to women developers, you often hear “I feel like I have to work really hard to convince people about my ideas” or “it’s taken me a long time to prove my worth” or “I still don’t know how to be seen as a full member of the team”.

    And you hear these a lot from women because we ask women a lot what they don’t like about their work, since we’re (correctly) concerned as an industry about the lack of female developers and the alarming rate at which they leave technical roles.

    However, if you ask any developer you’ll hear something similar.  Even very senior, very experienced (very white, very male) developers have a lot of frustration trying to convince others that their ideas have value.

    It’s not just a Problem With Women.

    I’ve been wondering if our problem is that we don’t listen.  When it comes to exchanging technical ideas, I think overall we’re not good at really listening to each other.  At the very least, I think we’re bad at making people feel heard.

    Let’s think about this for a bit: if we don’t listen to developers, if we don’t help them to understand why they’re wrong, or work together to incorporate all ideas into a super-idea that’s the best solution, developers will become frustrated.  We’re knowledge workers, what we bring to the table is our brains, our ideas, our solutions.  If these are persistently not valued, we could go one of two ways:

    1. Do it our way anyway. We still think we’re right, we haven’t been convinced that our idea is not correct, or that someone else’s is correct (maybe because we didn’t listen to them? Maybe because no-one took the time to listen to us and explain why we were wrong? Maybe because we were right and no-one was listening?).
    2. Leave. We might join a team where we feel more valued, or we might leave development all together.  At least as a business analyst, as a project manager, as a tester, people have to listen to us: by their very definition the output of those jobs is an input to the development team. 

    Option one leads to rogue code in our application, often not checked by anyone else in the team let alone understood by them, because of course we were not allowed to implement this.  So it’s done in secret.  If it works, at worst no-one notices.  And at best? You’re held up as a hero for actually Getting Something Done. This can’t be right, we’re rewarding the rebel behaviour, not encouraging honest discussion and making people feel included.

    Option two leads to the team (and maybe the industry) losing a developer. Sometimes you might argue “Good Riddance”.  But there’s such a skills shortage, it’s so hard (and expensive) to hire developers, and you must have seen something in that developer to hire them in the first place, that surely it’s cheaper, better, to make them feel welcome, wanted, valued?

    What can we do to listen to each other?

    • Retrospectives. Done right, these give the team a safe place to discuss things, to ask questions, to suggest improvements.  It’s not necessarily a place to talk about code or design, but it is a good place to raise issues like the ones above, and to suggest ways to address these problems.
    • You could schedule sessions for sharing technical ideas: maybe regular brown bags to help people understand the technologies or existing designs; maybe sessions where the architecture or design or principals of a particular area are explored and explained; maybe space and time for those who feel unheard to explain in more detail where they’re coming from, principals that are important to them.  It’s important that these sessions are developer-lead, so that everyone has an opportunity to share their ideas.
    • Pair programming. When you’re sat together, working together, there’s a flow of ideas, information, designs, experience. It’s not necessarily a case of a more senior person mentoring a less experienced developer, all of us have different skills and value different qualities in our implementation – for example, one of you could be obsessive about the tests, where the other really cares about readability of the code. When you implement something in a pair, you feel ownership of that code but you feel less personally attached to the code – you created it, but you created it from the best of both of you, you had to listen to each other to come to a conclusion and implement it. And if an even better idea comes along, great, it just improves the code. You’re constantly learning from the other people you work with, and can see the effect of them learning from you.
    • We should value, and coach, more skills than simply technology skills. I don’t know why we still seem to have this idea that developers are just typists communing with the computer – the best developers work well in teams and communicate effectively with the business and users; the best leaders make everyone in their team more productive.  In successful organisations, sales people are trained in skills like active listening, like dealing with objections.  More development teams should focus on improving these sorts of communication skills as a productivity tool. 

    I’m sure there are loads more options, I just thought of these in ten minutes.  If you read any books aimed at business people, or at growing your career, there are many tried and tested methods for making people feel heard, for playing nicely with others.

    So we should work harder to listen to each other. Next time you’re discussing something with your team, or with your boss, try and listen to what they’re saying – ask them to clarify things you don’t understand (you won’t look stupid, and developers love explaining things), and repeat back what you do understand. Request the same respect in return – if you feel your ideas aren’t being heard, make sure you sit down with someone to talk over your ideas or your doubts in more detail, and be firm in making sure the team or that person is hearing what you think you’re saying.  We may be wrong, they may be right, but we need to understand why we’re wrong, or we’ll never learn.

    If we all start listening a bit more, maybe we’ll be a bit happier.

    This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

    Own your heap: Iterate class instances with JVMTI

    Today I want to talk about a different Java that most of us don’t see and use every day, to be more exact about lower level bindings, some native code and how to perform some small magic. Albeit we won’t get to the true source of magic on JVM, but performing some small miracles is within a reach of a single post.

    I spend my days researching, writing and coding on the RebelLabs team at ZeroTurnaround, a company that creates tools for Java developers that mostly run as javaagents. It’s often the case that if you want to enhance the JVM without rewriting it or get any decent power on the JVM you have to dive into the beautiful world of Java agents. These come in two flavors: Java javaagents and native ones. In this post we’ll concentrate on the latter.

    Note, this GeeCON Prague presentation by Anton Arhipov, who is an XRebel product lead, is a good starting point to learn about javaagents written entirely in Java: Having fun with Javassist.

    In this post we’ll create a small native JVM agent, explore the possibility of exposing native methods into the Java application and find out how to make use of the Java Virtual Machine Tool Interface.

    If you’re looking for a practical takeaway from the post, we’ll be able to, spoiler alert, count how many instances of a given class are present on the heap.

    Imagine that you are Santa’s trustworthy hacker elf and the big red has the following challenge for you:
    Santa: My dear Hacker Elf, could you write a program that will point out how many Thread objects are currently hidden in the JVM’s heap?
    Another elf that doesn’t like to challenge himself would answer: It’s easy and straightforward, right?


    return Thread.getAllStackTraces().size();

    But what if we want to over-engineer our solution to be able to answer this question about any given class? Say we want to implement the following interface?


    public interface HeapInsight {
    int countInstances(Class klass);
    }

    Yeah, that’s impossible, right? What if you receive String.class as an argument? Have no fear, we’ll just have to go a bit deeper into the internals on the JVM. One thing that is available to JVM library authors is JVMTI, a Java Virtual Machine Tool Interface. It was added ages ago and many tools, that seem magical, make use of it. JVMTI offers two things:

    • a native API
    • an instrumentation API to monitor and transform the bytecode of classes loaded into the JVM.

    For the purpose of our example, we’ll need access to the native API. What we want to use is the IterateThroughHeap function, which lets us provide a custom callback to execute for every object of a given class.

    First of all, let’s make a native agent that will load and echo something to make sure that our infrastructure works.

    A native agent is something written in a C/C++ and compiled into a dynamic library to be loaded before we even start thinking about Java. If you’re not proficient in C++, don’t worry, plenty of elves aren’t, and it won’t be hard. My approach to C++ includes 2 main tactics: programming by coincidence and avoiding segfaults. So since I managed to write and comment the example code for this post, collectively we can go through it. Note: the paragraph above should serve as a disclaimer, don’t put this code into any environment of value to you.

    Here’s how you create your first native agent:


    #include
    #include

    using namespace std;

    JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM *jvm, char *options, void *reserved)
    {
    cout << "A message from my SuperAgent!" << endl;
    return JNI_OK;
    }

    The important part of this declaration is that we declare a function called Agent_OnLoad, which follows the documentation for the dynamically linked agents.

    Save the file as, for example a native-agent.cpp and let’s see what we can do about turning into a library.

    I’m on OSX, so I use clang to compile it, to save you a bit of googling, here’s the full command:


    clang -shared -undefined dynamic_lookup -o agent.so -I /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/include/ -I /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/include/darwin native-agent.cpp

    This creates an agent.so file that is a library ready to serve us. To test it, let’s create a dummy hello world Java class.


    package org.shelajev;
    public class Main {
    public static void main(String[] args) {
    System.out.println("Hello World!");
    }
    }

    When you run it with a correct -agentpath option pointing to the agent.so, you should see the following output:


    java -agentpath:agent.so org.shelajev.Main
    A message from my SuperAgent!
    Hello World!

    Great job! We now have everything in place to make it actually useful. First of all we need an instance of jvmtiEnv, which is available through a JavaVM *jvm when we are in the Agent_OnLoad, but is not available later. So we have to store it somewhere globally accessible. We do it by declaring a global struct to store it.


    #include
    #include

    using namespace std;

    typedef struct {
    jvmtiEnv *jvmti;
    } GlobalAgentData;

    static GlobalAgentData *gdata;

    JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM *jvm, char *options, void *reserved)
    {
    jvmtiEnv *jvmti = NULL;
    jvmtiCapabilities capa;
    jvmtiError error;

    // put a jvmtiEnv instance at jvmti.
    jint result = jvm->GetEnv((void **) &jvmti, JVMTI_VERSION_1_1);
    if (result != JNI_OK) {
    printf("ERROR: Unable to access JVMTI!n");
    }
    // add a capability to tag objects
    (void)memset(&capa, 0, sizeof(jvmtiCapabilities));
    capa.can_tag_objects = 1;
    error = (jvmti)->AddCapabilities(&capa);

    // store jvmti in a global data
    gdata = (GlobalAgentData*) malloc(sizeof(GlobalAgentData));
    gdata->jvmti = jvmti;
    return JNI_OK;
    }

    We also updated the code to add a capability to tag objects, which we’ll need for iterating through the heap. The preparations are done now, we have the JVMTI instance initialized and available for us. Let’s offer it to our Java code via a JNI.

    JNI stands for Java Native Interface, a standard way to include native code calls into a Java application. The Java part will be pretty straightforward, add the following countInstances method definition to the Main class:


    package org.shelajev;

    public class Main {
    public static void main(String[] args) {
    System.out.println("Hello World!");
    int a = countInstances(Thread.class);
    System.out.println("There are " + a + " instances of " + Thread.class);
    }

    private static native int countInstances(Class klass);
    }

    To accommodate the native method, we must change our native agent code. I’ll explain it in a minute, but for now add the following function definitions there:


    extern "C"
    JNICALL jint objectCountingCallback(jlong class_tag, jlong size, jlong* tag_ptr, jint length, void* user_data)
    {
    int* count = (int*) user_data;
    *count += 1;
    return JVMTI_VISIT_OBJECTS;
    }

    extern "C"
    JNIEXPORT jint JNICALL Java_org_shelajev_Main_countInstances(JNIEnv *env, jclass thisClass, jclass klass)
    {
    int count = 0;
    jvmtiHeapCallbacks callbacks;
    (void)memset(&callbacks, 0, sizeof(callbacks));
    callbacks.heap_iteration_callback = &objectCountingCallback;
    jvmtiError error = gdata->jvmti->IterateThroughHeap(0, klass, &callbacks, &count);
    return count;
    }

    Java_org_shelajev_Main_countInstances is more interesting here, its name follows the convention, starting with Java_ then the _ separated fully qualified class name, then the method name from the Java code. Also, don’t forget the JNIEXPORT declaration, which says that the function is exported into the Java world.

    Inside the Java_org_shelajev_Main_countInstances we specify the objectCountingCallback function as a callback and call IterateThroughHeap with the parameters that came from the Java application.

    Note that our native method is static, so the arguments in the C counterpart are:

     
    JNIEnv *env, jclass thisClass, jclass klass

    for an instance method they would be a bit different:

     
    JNIEnv *env, jobj thisInstance, jclass klass

    where thisInstance points to the this object of the Java method call.

    Now the definition of the objectCountingCallback comes directly from the documentation. And the body does nothing more than incrementing an int.

    Boom! All done! Thank you for your patience. If you’re still reading this, you’re ready to test all the code above.

    Compile the native agent again and run the Main class. This is what I see:


    java -agentpath:agent.so org.shelajev.Main
    Hello World!
    There are 7 instances of class java.lang.Thread

    If I add a Thread t = new Thread(); line to the main method, I see 8 instances on the heap. Sounds like it actually works. Your thread count will almost certainly be different, don’t worry, it’s normal because it does count JVM bookkeeping threads, that do compilation, GC, etc.

    Now, if I want to count the number of String instances on the heap, it’s just a matter of changing the argument class. A truly generic solution, Santa would be happy I hope.

    Oh, if you’re interested, it finds 2423 instances of String for me. A pretty high number for such as small application. Also,


    return Thread.getAllStackTraces().size();

    gives me 5, not 8, because it excludes the bookkeeping threads! Talk about trivial solutions, eh?

    Now you’re armed with this knowledge and this tutorial I’m not saying you’re ready to write your own JVM monitoring or enhancing tools, but it is definitely a start.

    In this post we went from zero to writing a native Java agent, that compiles, loads and runs successfully. It uses the JVMTI to obtain the insight into the JVM that is not accessible otherwise. The corresponding Java code calls the native library and interprets the result.
    This is often the approach the most miraculous JVM tools take and I hope that some of the magic has been demystified for you.

    What do you think, does it clarify agents for you? Let me know! Find me and chat with me on twitter: @shelajev.

    This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

    Creating a REST API with Spring Boot and MongoDB

    Spring Boot is an opinionated framework that simplifies the development of Spring applications. It frees us from the slavery of complex configuration files and helps us to create standalone Spring applications that don’t need an external servlet container.
    This sounds almost too good to be true, but Spring Boot can really do all this.
    This blog post demonstrates how easy it is to implement a REST API that provides CRUD operations for todo entries that are saved to MongoDB database.
    Let’s start by creating our Maven project.
    Note: This blog post assumes that you have already installed the MongoDB database. If you haven’t done this, you can follow the instructions given in the blog post titled: Accessing Data with MongoDB.

    Creating Our Maven Project

    We can create our Maven project by following these steps:

    1. Use the spring-boot-starter-parent POM as the parent POM of our Maven project. This ensures that our project inherits sensible default settings from Spring Boot.
    2. Add the Spring Boot Maven Plugin to our project. This plugin allows us to package our application into an executable jar file, package it into a war archive, and run the application.
    3. Configure the dependencies of our project. We need to configure the following dependencies:
      • The spring-boot-starter-web dependency provides the dependencies of a web application.
      • The spring-data-mongodb dependency provides integration with the MongoDB document database.
    4. Enable the Java 8 Support of Spring Boot.
    5. Configure the main class of our application. This class is responsible of configuring and starting our application.

    The relevant part of our pom.xml file looks as follows:

    <properties>
    <!-- Enable Java 8 -->
    <java.version>1.8</java.version>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <!-- Configure the main class of our Spring Boot application -->
    <start-class>com.javaadvent.bootrest.TodoAppConfig</start-class>
    </properties>

    <!-- Inherit defaults from Spring Boot -->
    <parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>1.1.9.RELEASE</version>
    </parent>

    <dependencies>
    <!-- Get the dependencies of a web application -->
    <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <!-- Spring Data MongoDB-->
    <dependency>
    <groupId>org.springframework.data</groupId>
    <artifactId>spring-data-mongodb</artifactId>
    </dependency>
    </dependencies>

    <build>
    <plugins>
    <!-- Spring Boot Maven Support -->
    <plugin>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-maven-plugin</artifactId>
    </plugin>
    </plugins>
    </build>


    Additional Reading:

    Let’s move on and find out how we can configure our application.

    Configuring Our Application

    We can configure our Spring Boot application by following these steps:

    1. Create a TodoAppConfig class to the com.javaadvent.bootrest package.
    2. Enable Spring Boot auto-configuration.
    3. Configure the Spring container to scan components found from the child packages of the com.javaadvent.bootrest package.
    4. Add the main() method to the TodoAppConfig class and implement by running our application.

    The source code of the TodoAppConfig class looks as follows:

    package com.javaadvent.bootrest;

    import org.springframework.boot.SpringApplication;
    import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
    import org.springframework.context.annotation.ComponentScan;
    import org.springframework.context.annotation.Configuration;

    @Configuration
    @EnableAutoConfiguration
    @ComponentScan
    public class TodoAppConfig {

    public static void main(String[] args) {
    SpringApplication.run(TodoAppConfig.class, args);
    }
    }


    We have now created the configuration class that configures and runs our Spring Boot application. Because the MongoDB jars are found from the classpath, Spring Boot configures the MongoDB connection by using its default settings.
    Additional Reading:

    Let’s move on and implement our REST API.

    Implementing Our REST API

    We need implement a REST API that provides CRUD operations for todo entries. The requirements of our REST API are:

    • A POST request send to the url ‘/api/todo’ must create a new todo entry by using the information found from the request body and return the information of the created todo entry.
    • A DELETE request send to the url ‘/api/todo/{id}’ must delete the todo entry whose id is found from the url and return the information of the deleted todo entry.
    • A GET request send to the url ‘/api/todo’ must return all todo entries that are found from the database.
    • A GET request send to the url ‘/api/todo/{id}’ must return the information of the todo entry whose id is found from the url.
    • A PUT request send to the url ‘/api/todo/{id}’ must update the information of an existing todo entry by using the information found from the request body and return the information of the updated todo entry.

    We can fulfill these requirements by following these steps:

    1. Create the entity that contains the information of a single todo entry.
    2. Create the repository that is used to save todo entries to MongoDB database and find todo entries from it.
    3. Create the service layer that is responsible of mapping DTOs into domain objects and vice versa. The purpose of our service layer is to isolate our domain model from the web layer.
    4. Create the controller class that processes HTTP requests and returns the correct response back to the client.

    Note: This example is so simple that we could just inject our repository to our controller. However, because this is not a viable strategy when we are implementing real-life applications, we will add a service layer between the web and repository layers.
    Let’s get started.

    Creating the Entity

    We need to create the entity class that contains the information of a single todo entry. We can do this by following these steps:

    1. Add the id, description, and title fields to the created entity class. Configure the id field of the entity by annotating the id field with the @Id annotation.
    2. Specify the constants (MAX_LENGTH_DESCRIPTION and MAX_LENGTH_TITLE) that specify the maximum length of the description and title fields.
    3. Add a static builder class to the entity class. This class is used to create new Todo objects.
    4. Add an update() method to the entity class. This method simply updates the title and description of the entity if valid values are given as method parameters.

    The source code of the Todo class looks as follows:

    import org.springframework.data.annotation.Id;

    import static com.javaadvent.bootrest.util.PreCondition.isTrue;
    import static com.javaadvent.bootrest.util.PreCondition.notEmpty;
    import static com.javaadvent.bootrest.util.PreCondition.notNull;

    final class Todo {

    static final int MAX_LENGTH_DESCRIPTION = 500;
    static final int MAX_LENGTH_TITLE = 100;

    @Id
    private String id;

    private String description;

    private String title;

    public Todo() {}

    private Todo(Builder builder) {
    this.description = builder.description;
    this.title = builder.title;
    }

    static Builder getBuilder() {
    return new Builder();
    }

    //Other getters are omitted

    public void update(String title, String description) {
    checkTitleAndDescription(title, description);

    this.title = title;
    this.description = description;
    }

    /**
    * We don't have to use the builder pattern here because the constructed
    * class has only two String fields. However, I use the builder pattern
    * in this example because it makes the code a bit easier to read.
    */
    static class Builder {

    private String description;

    private String title;

    private Builder() {}

    Builder description(String description) {
    this.description = description;
    return this;
    }

    Builder title(String title) {
    this.title = title;
    return this;
    }

    Todo build() {
    Todo build = new Todo(this);

    build.checkTitleAndDescription(build.getTitle(), build.getDescription());

    return build;
    }
    }

    private void checkTitleAndDescription(String title, String description) {
    notNull(title, "Title cannot be null");
    notEmpty(title, "Title cannot be empty");
    isTrue(title.length() <= MAX_LENGTH_TITLE,
    "Title cannot be longer than %d characters",
    MAX_LENGTH_TITLE
    );

    if (description != null) {
    isTrue(description.length() <= MAX_LENGTH_DESCRIPTION,
    "Description cannot be longer than %d characters",
    MAX_LENGTH_DESCRIPTION
    );
    }
    }
    }


    Additional Reading:

    Let’s move on and create the repository that communicates with the MongoDB database.

    Creating the Repository

    We have to create the repository interface that is used to save Todo objects to MondoDB database and retrieve Todo objects from it.
    If we don’t want to use the Java 8 support of Spring Data, we could create our repository by creating an interface that extends the CrudRepository<T, ID> interface. However, because we want to use the Java 8 support, we have to follow these steps:

    1. Create an interface that extends the Repository<T, ID> interface.
    2. Add the following repository methods to the created interface:
      1. The void delete(Todo deleted) method deletes the todo entry that is given as a method parameter.
      2. The List<Todo> findAll() method returns all todo entries that are found from the database.
      3. The Optional<Todo> findOne(String id) method returns the information of a single todo entry. If no todo entry is found, this method returns an empty Optional.
      4. The Todo save(Todo saved) method saves a new todo entry to the database and returns the the saved todo entry.

    The source code of the TodoRepository interface looks as follows:

    import org.springframework.data.repository.Repository;

    import java.util.List;
    import java.util.Optional;

    interface TodoRepository extends Repository<Todo, String> {

    void delete(Todo deleted);

    List<Todo> findAll();

    Optional<Todo> findOne(String id);

    Todo save(Todo saved);
    }


    Additional Reading:

    Let’s move on and create the service layer of our example application.

    Creating the Service Layer

    First, we have to create a service interface that provides CRUD operations for todo entries. The source code of the TodoService interface looks as follows:

    import java.util.List;

    interface TodoService {

    TodoDTO create(TodoDTO todo);

    TodoDTO delete(String id);

    List<TodoDTO> findAll();

    TodoDTO findById(String id);

    TodoDTO update(TodoDTO todo);
    }


    The TodoDTO class is a DTO that contains the information of a single todo entry. We will talk more about it when we create the web layer of our example application.
    Second, we have to implement the TodoService interface. We can do this by following these steps:

    1. Inject our repository to the service class by using constructor injection.
    2. Add a private Todo findTodoById(String id) method to the service class and implement it by either returning the found Todo object or throwing the TodoNotFoundException.
    3. Add a private TodoDTO convertToDTO(Todo model) method the service class and implement it by converting the Todo object into a TodoDTO object and returning the created object.
    4. Add a private List<TodoDTO> convertToDTOs(List<Todo> models) and implement it by converting the list of Todo objects into a list of TodoDTO objects and returning the created list.
    5. Implement the TodoDTO create(TodoDTO todo) method. This method creates a new Todo object, saves the created object to the MongoDB database, and returns the information of the created todo entry.
    6. Implement the TodoDTO delete(String id) method. This method finds the deleted Todo object, deletes it, and returns the information of the deleted todo entry. If no Todo object is found with the given id, this method throws the TodoNotFoundException.
    7. Implement the List<TodoDTO> findAll() method. This methods retrieves all Todo objects from the database, transforms them into a list of TodoDTO objects, and returns the created list.
    8. Implement the TodoDTO findById(String id) method. This method finds the Todo object from the database, converts it into a TodoDTO object, and returns the created TodoDTO object. If no todo entry is found, this method throws the TodoNotFoundException.
    9. Implement the TodoDTO update(TodoDTO todo) method. This method finds the updated Todo object from the database, updates its title and description, saves it, and returns the updated information. If the updated Todo object is not found, this method throws the TodoNotFoundException.

    The source code of the MongoDBTodoService looks as follows:

    import org.springframework.beans.factory.annotation.Autowired;
    import org.springframework.stereotype.Service;

    import java.util.List;
    import java.util.Optional;

    import static java.util.stream.Collectors.toList;

    @Service
    final class MongoDBTodoService implements TodoService {

    private final TodoRepository repository;

    @Autowired
    MongoDBTodoService(TodoRepository repository) {
    this.repository = repository;
    }

    @Override
    public TodoDTO create(TodoDTO todo) {
    Todo persisted = Todo.getBuilder()
    .title(todo.getTitle())
    .description(todo.getDescription())
    .build();
    persisted = repository.save(persisted);
    return convertToDTO(persisted);
    }

    @Override
    public TodoDTO delete(String id) {
    Todo deleted = findTodoById(id);
    repository.delete(deleted);
    return convertToDTO(deleted);
    }

    @Override
    public List findAll() {
    List todoEntries = repository.findAll();
    return convertToDTOs(todoEntries);
    }

    private List convertToDTOs(List models) {
    return models.stream()
    .map(this::convertToDTO)
    .collect(toList());
    }

    @Override
    public TodoDTO findById(String id) {
    Todo found = findTodoById(id);
    return convertToDTO(found);
    }

    @Override
    public TodoDTO update(TodoDTO todo) {
    Todo updated = findTodoById(todo.getId());
    updated.update(todo.getTitle(), todo.getDescription());
    updated = repository.save(updated);
    return convertToDTO(updated);
    }

    private Todo findTodoById(String id) {
    Optional result = repository.findOne(id);
    return result.orElseThrow(() -> new TodoNotFoundException(id));

    }

    private TodoDTO convertToDTO(Todo model) {
    TodoDTO dto = new TodoDTO();

    dto.setId(model.getId());
    dto.setTitle(model.getTitle());
    dto.setDescription(model.getDescription());

    return dto;
    }
    }


    We have now created the service layer of our example application. Let’s move on and create the controller class.

    Creating the Controller Class

    First, we need to create the DTO class that contains the information of a single todo entry and specifies the validation rules that are used to ensure that only valid information can be saved to the database. The source code of the TodoDTO class looks as follows:

    import org.hibernate.validator.constraints.NotEmpty;

    import javax.validation.constraints.Size;

    public final class TodoDTO {

    private String id;

    @Size(max = Todo.MAX_LENGTH_DESCRIPTION)
    private String description;

    @NotEmpty
    @Size(max = Todo.MAX_LENGTH_TITLE)
    private String title;

    //Constructor, getters, and setters are omitted
    }


    Additional Reading:

    Second, we have to create the controller class that processes the HTTP requests send to our REST API and sends the correct response back to the client. We can do this by following these steps:

    1. Inject our service to our controller by using constructor injection.
    2. Add a create() method to our controller and implement it by following these steps:
      1. Read the information of the created todo entry from the request body.
      2. Validate the information of the created todo entry.
      3. Create a new todo entry and return the created todo entry. Set the response status to 201.
    3. Implement the delete() method by delegating the id of the deleted todo entry forward to our service and return the deleted todo entry.
    4. Implement the findAll() method by finding the todo entries from the database and returning the found todo entries.
    5. Implement the findById() method by finding the todo entry from the database and returning the found todo entry.
    6. Implement the update() method by following these steps:
      1. Read the information of the updated todo entry from the request body.
      2. Validate the information of the updated todo entry.
      3. Update the information of the todo entry and return the updated todo entry.
    7. Create an @ExceptionHandler method that sets the response status to 404 if the todo entry was not found (TodoNotFoundException was thrown).

    The source code of the TodoController class looks as follows:

    import org.springframework.beans.factory.annotation.Autowired;
    import org.springframework.http.HttpStatus;
    import org.springframework.web.bind.annotation.ExceptionHandler;
    import org.springframework.web.bind.annotation.PathVariable;
    import org.springframework.web.bind.annotation.RequestBody;
    import org.springframework.web.bind.annotation.RequestMapping;
    import org.springframework.web.bind.annotation.RequestMethod;
    import org.springframework.web.bind.annotation.ResponseStatus;
    import org.springframework.web.bind.annotation.RestController;

    import javax.validation.Valid;
    import java.util.List;

    @RestController
    @RequestMapping("/api/todo")
    final class TodoController {

    private final TodoService service;

    @Autowired
    TodoController(TodoService service) {
    this.service = service;
    }

    @RequestMapping(method = RequestMethod.POST)
    @ResponseStatus(HttpStatus.CREATED)
    TodoDTO create(@RequestBody @Valid TodoDTO todoEntry) {
    return service.create(todoEntry);
    }

    @RequestMapping(value = "{id}", method = RequestMethod.DELETE)
    TodoDTO delete(@PathVariable("id") String id) {
    return service.delete(id);
    }

    @RequestMapping(method = RequestMethod.GET)
    List<TodoDTO> findAll() {
    return service.findAll();
    }

    @RequestMapping(value = "{id}", method = RequestMethod.GET)
    TodoDTO findById(@PathVariable("id") String id) {
    return service.findById(id);
    }

    @RequestMapping(value = "{id}", method = RequestMethod.PUT)
    TodoDTO update(@RequestBody @Valid TodoDTO todoEntry) {
    return service.update(todoEntry);
    }

    @ExceptionHandler
    @ResponseStatus(HttpStatus.NOT_FOUND)
    public void handleTodoNotFound(TodoNotFoundException ex) {
    }
    }


    Note: If the validation fails, our REST API returns the validation errors as JSON and sets the response status to 400. If you want to know more about this, read a blog post titled: Spring from the Trenches: Adding Validation to a REST API.
    That is it. We have now created a REST API that provides CRUD operations for todo entries and saves them to MongoDB database. Let’s summarize what we learned from this blog post.

    Summary

    This blog post has taught us three things:

    • We can get the required dependencies with Maven by declaring only two dependencies: spring-boot-starter-web and spring-data-mongodb.
    • If we are happy with the default configuration of Spring Boot, we can configure our web application by using its auto-configuration support and “dropping” new jars to the classpath.
    • We learned to create a simple REST API that saves information to MongoDB database and finds information from it.

    You can get the example application of this blog post from Github.
    This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

    Mutation testing in Java with PIT and Gradle

    A project might find itself in different stages of unit test line and branch coverage.
    But what does that really say about a project? If it’s 100% covered, is it really tested?
    Or as wikipedia puts it :

    Tests can be created to verify the correctness of the implementation of a given software system, but the creation of tests still poses the question whether the tests are correct and sufficiently cover the requirements that have originated the implementation. 

    Whether intentionally doing a poor job, or just being sloppy, coverage percentages just serve as a nice tool to lie bring reassurances to oneself or to the management about quality.
    The real measure of quality are the tests quality AND the amount covered.
    So this begs the question, besides reviewing the tests alongside the code and giving it some subjective grade, is there such a tool that can speak volumes about the quality of tests ?
    The answer is Yes: Mutation testing
    I won’t bore you with the details you can find out more on the provided links, but the basic idea is :
    If you modify the code in some non-equivalent way and the test still pass, you have bad tests, and some programming error down the line won’t be caught by the tests.
    Fortunately some nice folks are working on a really cool project called PIT (github), which is a mutation tester for Java.
    I encourage you to read everything on the site including the media links and try it out.
    For the purpose of this article I will provide an example project for showcasing mutation testing with PIT, including a gradle build script. This is made possible by the PIT gradle plugin 
    Github for this post here : game-of-life-mutation-test

    I choose to implement Conway’s Game of Life, inspired by the Global Day of Code Retreat. One of the tests is commented. Let’s assume in a real world scenario we would have written all the tests there, without that one.
    Coverage tools report 100% line coverage and branch coverage ( ignoring the equals method )
    Eclipse with EclEmma and Intellij Coverage report : 
    At this point you might call it a day and a job well done. Test look preety thorough, and there are a lot of them, the code is tested, but are the tests solid?
    Turns out they are not. If we run the PIT tester
    gradle pitest

    reports will be generated on the path
    game-of-life-mutation-testbuildreportspitest<timestamp>index.html


    PIT has detected that if we negate the condition test pass all the same.
    It would be easy to make this mistake, or some other developer to alter the code in such a subtle way and no one to notice until one day when the code will not work as expected in a production environment.
    To solve this uncomment the glider test ( a more complex test for GoL ).
    If you trully believe you have quality tests, consider running mutation testing for an extra verification.
    This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!