Merry Christmas everyone!

This is the first year of the Java Advent Project and I am really grateful to all the people that got involved, published articles, twitted, shared, +1ed, shared etc. etc.
It was an unbelievable journey and all the glory needs to go to the people that took some time from their loved ones to give us their wisdom. As they say, the Class of 2014 of Java Advent is comprised of (in the order of publishing date):

Thank you girls and guys for making it happen yet once more. And sorry for stressing and pushing you out. Also, last but not least thanks to Voxxed editors Lucy Carey and Mite Mitreski.

A Musical Finale

What could be more fitting than Christmas music for Christmas Eve?

In this post I want to discuss the joy of making music with Java and why/how I have come to use Python…

But first, let us celebrate the season!

We are all human and irrespective of our beliefs, it seems we all enjoy music of some form. For me some of the most beautiful music of all was written by Johan Sebastian Bach. Between 1708 and 1717 he wrote a set of pieces which are collectively called Orgelbüchlein (Little Organ Book). For this post and to celebrate the Java Advent Calendar I tasked Sonic Field to play this piece of music modelling the sounds of a 18th century pipe organ. If you did not know, yes some German organs of about that time really were able to produce huge sounds with reed pipes (for example, Passacaglia And Fugue the Trost Organ). The piece here is a ‘Choral Prelude’ which is based on what we would in English commonly call a Carol to be sung by an ensemble.

BWV 610 Jesu, meine Freude [Jesus, my joy]
This performance dedicated to the Java Advent Calendar
and created exclusively on the JVM using pure
mathematics.
How was this piece created?
Step one is to transcribe the score into midi. Fortunately, someone else already did this for me using automated score reading software. Not so fortunately, this software makes all sorts of mistakes which have to be fixed. The biggest issue with automatically generated midi files is that they end up with overlapped notes on the same channel; that is strictly impossible in midi and ends up with an ambiguous interpretation of what the sound should be. Midi considers audio as note on, note off. So Note On, Note On, Note Off, Note Off is ambiguous; does it mean:

One note overlapping the next or:
—————–
             —————

One note entirely contained in a longer note?
—————————-
             —-

Fortunately, tricks can be used to try and figure this out based on note length etc. The Java decoder always treats notes as fully contained. The Python method looks for very short notes which are contained in long ones and guesses the real intention was two long notes which ended up overlapped slightly. Here is the python (the Java is here on github).


def repareOverlapMidi(midi,blip=5):
print "Interpretation Pass"
mute=True
while mute:
endAt=len(midi)-1
mute=False
index=0
midiOut=[]
this=[]
next=[]
print "Demerge pass:",endAt
midi=sorted(midi, key=lambda tup: tup[0])
midi=sorted(midi, key=lambda tup: tup[3])
while index<endAt:
this=midi[index]
next=midi[index+1]
ttickOn,ttickOff,tnote,tkey,tvelocity=this
ntickOn,ntickOff,nnote,nkey,nvelocity=next

# Merge interpretation
finished=False
dif=(ttickOff-ttickOn)
if dif<blip and tkey==nkey and ttickOff>=ntickOn and ttickOff<=ntickOff:
print "Separating: ",this,next," Diff: ",(ttickOff-ntickOn)
midiOut.append([ttickOn ,ntickOn ,tnote,tkey,tvelocity])
midiOut.append([ttickOff,ntickOff,nnote,nkey,nvelocity])
index+=1
mute=True
elif dif<blip:
print "Removing blip: ",(ttickOff-ttickOn)
index+=1
mute=True
continue
else:
midiOut.append(this)
# iterate the loop
index+=1
if index==endAt:
midiOut.append(next)
if not mute:
return midiOut
midi=midiOut

[This AGPL code is on Github]

Then comes some real fun. If you know the original piece, you might have noticed that the introduction is not original. I added that in the midi editing software Aria Maestosa. It does not need to be done this way; we do not even need to use midi files. A lot of the music I have created in Sonic Field is just coded directly in Python. However, from midi is how it was done here.

Once we have a clean set of notes they need to be converted into sounds. That is done with ‘voicing’. I will talk a little about that to set the scene then we can get back into more Java code oriented discussion. After all, this is the Java advent calendar!

Voicing is exactly the sort of activity which brings Python to the fore. Java is a wordy language which has a large degree of strictness. It favours well constructed, stable structures. Python relies on its clean syntax rules and layout and the principle of least astonishment. For me, this Pythonic approach really helps with the very human process of making a sound:


def chA():
global midi,index
print "##### Channel A #####"
index+=1
midi=shorterThanMidi(midis[index],beat,512)
midi=dampVelocity(midi,80,0.75)
doMidi(voice=leadDiapason,vCorrect=1.0)
postProcess()

midi=longAsMidi(midis[index],beat,512)
midi=legatoMidi(midi,beat,128)
midi=dampVelocity(midi,68,0.75)
midi=dampVelocity(midi,80,0.75)
doMidi(voice=orchestralOboe,vCorrect=0.35,flatEnv=True,pan=0.2)
postProcessTremolate(rate=3.5)
doMidi(voice=orchestralOboe,vCorrect=0.35,flatEnv=True,pan=0.8)
postProcessTremolate(rate=4.5)

Above is a ‘voice’. Contrary to what one might think, a synthesised sound does not often consist of just one sound source. It consists of many. A piece of music might have many ‘voices’ and each voice will be a composite of several sounds. To create just the one voice above I have split the notes into long notes and short notes. Then the actual notes are created by a call to doMidi. This takes advantage of Python’s ‘named arguments with default values’ feature. Here is the signature for doMidi:


def doMidi(voice,vCorrect,pitchShift=1.0,qFactor=1.0,subBass=False,flatEnv=False,pure=False,pan=-1,rawBass=False,pitchAdd=0.0,decay=False,bend=True):

The most complex (unsurprisingly) voice to create is that
of a human singing. I have been working on this for
a long time and there is a long way to go; however, its
is a spectrogram of a piece of music which does
a passable job of sounding like someone singing.

The first argument is actually a reference to a function which will create the basic tone. The rest of the arguments describe how that tone will be manipulated in the note formation. Whilst an approach like this can be mimicked using a builder pattern in Java; this latter language does not lend it self to the ‘playing around’ nature of Python (at least for me).

For example, I could just run the script and add flatEvn=True to the arguments and run it again and compare the two sounds. It is an intuitive way of working.

Anyhow, once each voice has been composited from many tones and tweaked into the shape and texture we want, they turn up as a huge list of lists of notes which are all mixed together and written out to disk as a flat file format which is basically just a dump of the underlying double data. At this point it sounds terrible! Making the notes is often only half the story.

Voice Synthesis by Sonic Field
played specifically for this post.

You see, real sounds happen in a space. Our Choral is expected to be performed in a church. Notes played without a space around them sound completely artificial and lack any interest. To solve this we use impulse response reverberation. The mathematics behind this is rather complex and so I will not go into it in detail. However in the next section I will start to look at this as a perfect example of why Java is not only necessary but ideal as the back end to Python/Jython.

You seem to like Python Alex – Why Bother With Java?

My post might seem a bit like a Python sales job so far. What has been happening is simply a justification of using Python when Java is so good as a language (especially when written in a great IDE like Eclipse for Java). Let us look at something Python would be very bad indeed at. Here is the code for performing the Fast Fourier Transform, which is a the heart of putting sounds into a space.


package com.nerdscentral.audio.pitch.algorithm;

public class CacheableFFT
{

private final int n, m;

// Lookup tables. Only need to recompute when size of FFT changes.
private final double[] cos;
private final double[] sin;
private final boolean forward;

public boolean isForward()
{
return forward;
}

public int size()
{
return n;
}

public CacheableFFT(int n1, boolean isForward)
{
this.forward = isForward;
this.n = n1;
this.m = (int) (Math.log(n1) / Math.log(2));

// Make sure n is a power of 2
if (n1 != (1 << m)) throw new RuntimeException(Messages.getString("CacheableFFT.0")); //$NON-NLS-1$

cos = new double[n1 / 2];
sin = new double[n1 / 2];
double dir = isForward ? -2 * Math.PI : 2 * Math.PI;

for (int i = 0; i < n1 / 2; i++)
{
cos[i] = Math.cos(dir * i / n1);
sin[i] = Math.sin(dir * i / n1);
}

}

public void fft(double[] x, double[] y)
{
int i, j, k, n1, n2, a;
double c, s, t1, t2;

// Bit-reverse
j = 0;
n2 = n / 2;
for (i = 1; i < n - 1; i++)
{
n1 = n2;
while (j >= n1)
{
j = j - n1;
n1 = n1 / 2;
}
j = j + n1;

if (i < j)
{
t1 = x[i];
x[i] = x[j];
x[j] = t1;
t1 = y[i];
y[i] = y[j];
y[j] = t1;
}
}

// FFT
n1 = 0;
n2 = 1;

for (i = 0; i < m; i++)
{
n1 = n2;
n2 = n2 + n2;
a = 0;

for (j = 0; j < n1; j++)
{
c = cos[a];
s = sin[a];
a += 1 << (m - i - 1);

for (k = j; k < n; k = k + n2)
{
t1 = c * x[k + n1] - s * y[k + n1];
t2 = s * x[k + n1] + c * y[k + n1];
x[k + n1] = x[k] - t1;
y[k + n1] = y[k] - t2;
x[k] = x[k] + t1;
y[k] = y[k] + t2;
}
}
}
}
}

[This AGPL code is on Github]

It would be complete lunacy to implement this methematics in JPython (dynamic late binding would give unusably bad performance). Java does a great job of running it quickly and efficiently. In Java this runs just about as fast as it could in any language plus the clean, simple object structure of Java means that using the ‘caching’ system as straight forward. The caching comes from the fact that the cos and sin multipliers of the FFT can be re-used when the transform is the same length. Now, in the creation of reverberation effects (those effects which put sound into a space) FFT lengths are the same over and over again due to windowing. So the speed and object oriented power of Java have both fed into creating a clean, high performance implementation.

But we can go further and make the FFT parallelised:


def reverbInner(signal,convol,grainLength):
def rii():
mag=sf.Magnitude(+signal)
if mag>0:
signal_=sf.Concatenate(signal,sf.Silence(grainLength))
signal_=sf.FrequencyDomain(signal_)
signal_=sf.CrossMultiply(convol,signal_)
signal_=sf.TimeDomain(signal_)
newMag=sf.Magnitude(+signal_)
if newMag>0:
signal_=sf.NumericVolume(signal_,mag/newMag)
# tail out clicks due to amplitude at end of signal
return sf.Realise(signal_)
else:
return sf.Silence(sf.Length(signal_))
else:
-convol
return signal
return sf_do(rii)

def reverberate(signal,convol):
def revi():
grainLength = sf.Length(+convol)
convol_=sf.FrequencyDomain(sf.Concatenate(convol,sf.Silence(grainLength)))
signal_=sf.Concatenate(signal,sf.Silence(grainLength))
out=[]
for grain in sf.Granulate(signal_,grainLength):
(signal_i,at)=grain
out.append((reverbInner(signal_i,+convol_,grainLength),at))
-convol_
return sf.Clean(sf.FixSize(sf.MixAt(out)))
return sf_do(revi)

Here we have the Python which performs the FFT to produce impulse response reverberation (convolution reverb is another name for this approach). The second function breaks the sound into grains. Each grain is then processes individually and they all have the same length. This performs that windowing effect I talked about earlier (I use a triangular window which is not ideal but works well enough due to the long window size). If the grains are long enough, the impact of lots of little FFT calculation basically the same as the effect of one huge one. However, FFT is a nLog(n) process, so lots of little calculations is a lot faster than one big one. In effect, windowing make FFT become a linear scaling calculation.

Note that the granulation process is performed in a future. We define a closure called revi and pass it to sf_do() which is executed it at some point in the future base on demand and the number of threads available.  Next we can look at the code which performs the FFT on each grain – rii. That again is performed in a future. In other words, the individual windowed FFT calculations are all performed in futures. The expression of a parallel windowed FFT engine in C or FORTRAN ends up very complex and rather intractable. I have not personally come across one which is integrated into the generalised, thread pooled, future based schedular. Nevertheless, the combination of Jython and Java makes such a thing very easy to create.

How are the two meshed?

Now that I hope I have put a good argument for hybrid programming between a great dynamic language (in this case Python) and a powerful mid level static language (in this case Java) it is time to look at how the two are fused together. There are many ways of doing this but Sonic Field picks a very distinct approach. It does not offer a general interface between the two where lots of intermediate code is generated and each method in Java is exposed separately into Python; rather it uses a uniform single interface with virtual dispatch.

Sonic Field defines a very (aggressively) simple calling convention from Python into Java which initially might look like a major pain in the behind but works out to create a very flexible and powerful approach.

Sonic Field defines ‘operators’ which all implement the following interface:


/* For Copyright and License see LICENSE.txt and COPYING.txt in the root directory */
package com.nerdscentral.sython;

import java.io.Serializable;

/**
* @author AlexTu
*
*/
public interface SFPL_Operator extends Serializable
{

/**
* <b>Gives the key word which the parser will use for this operator</b>
*
* @return the key word
*/
public String Word();

/**
* <b>Operate</b> What ever this operator does when SFPL is running is done by this method. The execution loop all this
* method with the current execution context and the passed forward operand.
*
* @param input
* the operand passed into this operator
* @param context
* the current execution context
* @return the operand passed forward from this operator
* @throws SFPL_RuntimeException
*/
public Object Interpret(Object input, SFPL_Context context) throws SFPL_RuntimeException;
}
The word() method returns the name of the operator as it will be expressed in Python. The Interpret() method processes arguments passed to it from Python. As Sonic Field comes up it creates a Jython interpreter and then adds the operators to it. The mechanism for doing this is a little involved so rather than go into detail here, I will simply give links to the code on github:
The result is that every operator is exposed in Python as sf.xxx where xxx is the return from the word() method. With clever operator overloading and other syntactical tricks in Python I am sure that the approach could be refined. Right now, there are a lot of sf.xxx calls in Sonic Field Python ( I call it Synthon ) but I have not gotten around to improving on this simple and effective approach.

You might have noticed that everything passed into Java from Python is just ‘object’. This seems a bit crude at first take. However, as we touched on in the section of futures in the previous post, it offers many advantages because the translation from Jython to Java is orchestrated via the Caster object and a layer of Python which transparently perform many useful translations. For example, the code automatically translates multiple arguments in Jython to a list of objects in Java:


def run(self,word,input,args):
if len(args)!=0:
args=list(args)
args.insert(0,input)
input=args
trace=''.join(traceback.format_stack())
SFSignal.setPythonStack(trace)
ret=self.processors.get(word).Interpret(input,self.context)
return ret

Here we can see how the arguments are processed into a list  (which is Jython is implemented as an ArrayList) if there are more than one but are passed as a single object is there is only one. We can also see how the Python stack trace is passed into a thread local in  the Java SFSignal object. Should an SFSignal not be freed or be double collected, this Python stack is displayed to help debug the program.

Is this interface approach a generally good idea for Jython/Java Communication?

Definitely not! It works here because of the nature of the Sonic Field audio progressing architecture. We have processors which can be chained. Each processor has a simple input and output. The semantic content passed between Python and Java is quite limited. In more general purpose programming, this simple architecture, rather than being flexible and powerful, would be highly restrictive. In this case, the normal Jython interface with Java would be much more effective. Again, we can see a great example of this simplicity in the previous post when talking about threading (where Python access Java Future objects). Another example is the direct interaction of Python with SFData objects in this post on modelling oscillators in Python.


from com.nerdscentral.audio import SFData
...
data=SFData.build(len)
for x in range(0,length):
s,t,xs=osc.next()
data.setSample(x,s)

Which violated the programming model of Sonic Field by creating audio samples directly from Jython, but at the same time illustrates the power of Jython! It also created one of the most unusual soundscapes I have so far achieved with the technology:

Engines of war, sound modelling
from oscillators in Python.

Wrapping It Up

Well, that is all folks. I could ramble on for ever, but I think I have answered most if not all of the questions I set out in the first post. The key ones that really interest me are about creativity and hybrid programming. Naturally, I am obsessed with performance as I am by profession an optimisation consultant, but moving away from my day job, can Jython and Java be a creating environment and do they offer more creativity than pure Java?

Transition State Analysis using
hybrid programming

Too many years ago I worked on a similar hybrid approach in scientific computing. The GRACE software which I helped develop as part of the team at Bath was able to break new ground because it was easier to explore ideas in the hybrid approach than writing raw FORTRAN constantly. I cannot present in deterministic, reductionist language a consistent argument for why this applied then to science or now to music; nevertheless, experience from myself and others has show this to be a strong argument.

Whether you agree or disagree; irrespective of if you like the music or detest it; I wish you a very merry Christmas indeed.

This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

The Java Ecosystem – My top 5 highlights of 2014

1. February the 1st – RedMonk Analyst firm declares that Java is more popular & diverse than ever!

The Java Ecosystem started off with a hiss and a roar in 2014 with the annual meeting of the Free Java room at FOSDEM. As well as the many fine deep technical talks on OpenJDK and related topics there was also a surprise presentation on the industry from Steve O’Grady (RedMonk Analyst). Steve gave a data lead insight into where Java ranked in terms of both popularity and scope at the start of 2014. The analysis on where Java is as a language is repeated on RedMonk’s Blog. The fact it remains a top two language didn’t surprise anyone, but it was the other angle that really surprised even those of us heavily involved in the ecosystem. Steve’s talk clearly showed that Java is aggressively diverse, appearing in industries such as social media, messaging, gaming, mobile, virtualisation, build systems and many more, not just Enterprise apps that people most commonly think about. Steve also showed that Java is being used heavily in new projects (across all of those industry sectors) which certainly killed the myth of Java being a legacy enterprise platform.

2. March the 18th – Java 8 arrives

The arrival of Java 8 ushered in a new Functional/OO hybrid direction for the language giving it a new lease of life. The adoption rates have been incredible (See Typesafe’s full report on this) it was clearly the release that Java developers were waiting for.

Some extra thoughts around the highlights of this release:

  • Lambdas (JSR 335) – There has been so much written about this topic already with a ton of fantastic books and tutorials to boot. For me the clear benefit to most Java developers was that they’re finally able to express the correct intent of behaviour with collections without all of the unnecessary boiler plate that imperative/OO constructs forced upon them. It boils down to the old adage of That there are only two problems in computer science, cache invalidation, naming things, and off-by-one errors. The new streams API for collection in conjunction with Lambdas certainly helps with the last two!
  • Project Nashorn (JSR 223, JEP 174) – The JavaScript runtime which allows developers to embed JavaScript code within their Java applications. Although I personally won’t be using this anytime soon, it was yet another boost to the JVM in terms of first class support for dynamically typed languages. I’m looking forward to this trend continuing!
  • Date and Time API (JSR 310, JEP 150) – This is sort of bread and butter API that a blue collar language like Java just needs to get right, and this time (take 3) they did! It’s been great to finally be able to work with timezones correctly and it also set a new precedence of Immutable First as a conscious design decision for new APIs in Java.

3. ~July – ARM 64 port (AArch64)

RedHat have lead the effort to get the ARMv8 64-bit architecture supported in Java. This is clearly an important step to keep Java truly “Run anywhere” and alongside SAP’s PowerPC/AIX port represents two major ports that are primarily maintained by non-Oracle participants in OpenJDK. If you’d like to help get involved see the project page for more details.

Java still has a way to go before becoming a major player in the embedded space, but the signs in 2014 were encouraging with Java SE Embedded featuring regularly on the Raspberry Pi and Java ME Embedded getting a much needed feature parity boost with Java SE APIs.

4. Sept/Oct – A Resurgence in the JCP & it’s 15th Anniversary

The Java Community Process (JCP) is the standards body that defines what goes into Java SE, Java EE and the Java ME. It re-invented itself as a much more open community based organisation in 2013 and continued that good work in 2014, reversing the dropping membership trend. Most importantly – it now truly represents the incredible diversity that the Java ecosystem has. You can see the make up of the existing Executive Committee – you can see that institutions like Java User Groups sitting alongside industry and end user heavyweights such as IBM, Twitter and Goldman Sachs.

Community Collaboration at an all time high & Microsoft joins OpenJDK.

The number of new joiners to OpenJDK (see Mani’s excellent post on this) was higher than ever. OpenJDK now represents a huge melting pot of major tech companies such as Red Hat, IBM, Oracle, Twitter and of course the shock entry this year of Microsoft.

The Adopt a JSR and Adopt OpenJDK programmes continue to bring more day to day developers involved in guiding the future of various APIs with regular workshops now being organised globally around the world to test new APis and ideas out early and feed that back in OpenJDK and the Java EE specifications in particular.

Community conferences & the number of Java User Groups continue rise in numbers, JavaOne in particular having it’s strongest year in recent memory. It was also heartening to see a large number of community efforts helping kids learn to code with after school and weekend programmes such as Devoxx for Kids.

What for 2015?

I’ll expect 2015 to be a little bit quieter in terms of changes for the core language or exciting new features to Java EE or Java ME as their next major releases aren’t due to 2016. On the community etc front I expect to see Java developers having to firmly embrace web/UI technologies such as AngularJS, More systems/Devops toolchains such as Docker, AWS, Puppet etc and of course migrate to Java 8 and all of the functional goodness it now brings! The community I’m sure will continue to thrive and the looming spectre of IoT will start to come into the mainstream as well. Java developers will likely have to wait until Java 9 to get a truly first class platform for embedded, but early adopters will want to begin taking a look at early builds throughout 2015. Java/JVM applications now tend to be complex, with many moving parts and distributed deployments. It can often take poor frustrated developers weeks to fix issues in production. To combat this there are a new wave of interesting analysis tools dealing with Java/JVM based applications and deployments. Oracle’s Mission Control is a powerful tool that can give lots of interesting insights into the JVM and other tools like Xrebel from ZeroTurnaround, jClarity’s Censum and Illuminate take the next step of applying machine learned analysis to the raw numbers. One final important note. Project Jigsaw is the modularisation story for Java 9 that will massively impact tool vendors and day to day developers alike. The community at large needs your help to help test out early builds of Java 9 and to help OpenJDK developers and tool vendors ensure that IDEs, build tools and applications are ready for this important change. You can join us in the Adoption Group at OpenJDK: http://adoptopenjdk.java.net I hope everyone has a great holiday break – I look forward to seeing the Twitter feeds and the GitHub commits flying around in 2015 :-).
Cheers,
Martijn (CEO – jClarity, Java Champion & Diabolical Developer)

This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

A persistent KeyValue Server in 40 lines and a sad fact

Advent time again .. picking up Peters well written overview on the uses of Unsafe, i’ll have a short fly-by on how low level techniques in Java can save development effort by enabling a higher level of abstraction or allow for Java performance levels probably unknown to many.

My major point is to show that conversion of Objects to bytes and vice versa is an important fundamental, affecting virtually any modern java application.

Hardware enjoys to process streams of bytes, not object graphs connected by pointers as “All memory is tape” (M.Thompson if I remember correctly ..).

Many basic technologies are therefore hard to use with vanilla Java heap objects:

  • Memory Mapped Files – a great and simple technology to persist application data safe, fast & easy.
  • Network communication is based on sending packets of bytes
  • Interprocess communication (shared memory)
  • Large main memory of today’s servers (64GB to 256GB). (GC issues)
  • CPU caches work best on data stored as a continuous stream of bytes in memory

so use of the Unsafe class in most cases boil down in helping to transform a java object graph into a continuous memory region and vice versa either using

  • [performance enhanced] object serialization or
  • wrapper classes to ease access to data stored in a continuous memory region.

(source of examples used in this post can be found here, messaging latency test here)


    Serialization based Off-Heap

    Consider a retail WebApplication where there might be millions of registered users. We are actually not interested in representing data in a relational database as all needed is a quick retrieve of user related data once he logs in. Additionally one would like to traverse the social graph quickly.

    Let’s take a simple user class holding some attributes and a list of ‘friends’ making up a social graph.

    easiest way to store this on heap, is a simple huge HashMap.

    Alternatively one can use off heap maps to store large amounts of data. An off heap map stores its keys and values inside the native heap, so garbage collection does not need to track this memory. In addition, native heap can be told to automagically get synchronized to disk (memory mapped files). This even works in case your application crashes, as the OS manages write back of changed memory regions.

    There are some open source off heap map implementations out there with various feature sets (e.g. ChronicleMap), for this example I’ll use a plain and simple implementation featuring fast iteration (optional full scan search) and ease of use.

    Serialization is used to store objects, deserialization is used in order to pull them to the java heap again. Pleasantly I have written the (afaik) fastest fully JDK compliant object serialization on the planet, so I’ll make use of that.

     Done:

    • persistence by memory mapping a file (map will reload upon creation). 
    • Java Heap still empty to serve real application processing with Full GC < 100ms. 
    • Significantly less overall memory consumption. A user record serialized is ~60 bytes, so in theory 300 million records fit into 180GB of server memory. No need to raise the big data flag and run 4096 hadoop nodes on AWS ;).
    Comparing a regular in-memory java HashMap and a fast-serialization based persistent off heap map holding 15 millions user records, will show following results (on a 3Ghz older XEON 2×6):

    consumed Java Heap (MB) Full GC (s) Native Heap (MB) get/put ops per s required VM size (MB)
    HashMap 6.865,00 26,039 0 3.800.000,00
    12.000,00
    OffheapMap (Serialization based)
    63,00
    0,026
    3.050
    750.000,00
    500,00

    [test source / blog project] Note: You’ll need at least 16GB of RAM to execute them.

    As one can see, even with fast serialization there is a heavy penalty (~factor 5) in access performance, anyway: compared to other persistence alternatives, its still superior (1-3 microseconds per “get” operation, “put()” very similar).

    Use of JDK serialization would perform at least 5 to 10 times slower (direct comparison below) and therefore render this approach useless.

    Trading performance gains against higher level of abstraction: “Serverize me”

    A single server won’t be able to serve (hundreds of) thousands of users, so we somehow need to share data amongst processes, even better: across machines.

    Using a fast implementation, its possible to generously use (fast-) serialization for over-the-network messaging. Again: if this would run like 5 to 10 times slower, it just wouldn’t be viable. Alternative approaches require an order of magnitude more work to achieve similar results.

    By wrapping the persistent off heap hash map by an Actor implementation (async ftw!), some lines of code make up a persistent KeyValue server with a TCP-based and a HTTP interface (uses kontraktor actors). Of course the Actor can still be used in-process if one decides so later on.

    Now that’s a micro service. Given it lacks any attempt of optimization and is single threaded, its reasonably fast [same XEON machine as above]:

    • 280_000 successful remote lookups per second 
    • 800_000 in case of fail lookups (key not found)
    • serialization based TCP interface (1 liner)
    • a stringy webservice for the REST-of-us (1 liner).

    [source: KVServer, KVClient] Note: You’ll need at least 16GB of RAM to execute the test.

    A real world implementation might want to double performance by directly putting received serialized object byte[] into the map instead of encoding it twice (encode/decode once for transmission over wire, then decode/encode for offheaping map).

    “RestActorServer.Publish(..);” is a one liner to also expose the KVActor as a webservice in addition to raw tcp:

    C like performance using flyweight wrappers / structs

    With serialization, regular Java Objects are transformed to a byte sequence. One can do the opposite: Create  wrapper classes which read data from fixed or computed positions of an underlying byte array or native memory address. (E.g. see this blog post).

    By moving the base pointer its possible to access different records by just moving the the wrapper’s offset. Copying such a “packed object” boils down to a memory copy. In addition, its pretty easy to write allocation free code this way. One downside is, that reading/writing single fields has a performance penalty compared to regular Java Objects. This can be made up for by using the Unsafe class.

    “flyweight” wrapper classes can be implemented manually as shown in the blog post cited, however as code grows this starts getting unmaintainable.
    Fast-serializaton provides a byproduct “struct emulation” supporting creation of flyweight wrapper classes from regular Java classes at runtime. Low level byte fiddling in application code can be avoided for the most part this way.

    How a regular Java class can be mapped to flat memory (fst-structs):

    Of course there are simpler tools out there to help reduce manual programming of encoding  (e.g. Slab) which might be more appropriate for many cases and use less “magic”.

    What kind of performance can be expected using the different approaches (sad fact incoming) ?

    Lets take the following struct-class consisting of a price update and an embedded struct denoting a tradable instrument (e.g. stock) and encode it using various methods:

    a ‘struct’ in code
    Pure encoding performance:
    Structs fast-Ser (no shared refs) fast-Ser JDK Ser (no shared) JDK Ser
    26.315.000,00 7.757.000,00 5.102.000,00 649.000,00 644.000,00




    Real world test with messaging throughput:

    In order to get a basic estimation of differences in a real application, i do an experiment how different encodings perform when used to send and receive messages at a high rate via reliable UDP messaging:

    The Test:
    A sender encodes messages as fast as possible and publishes them using reliable multicast, a subscriber receives and decodes them.

    Structs fast-Ser (no shared refs) fast-Ser JDK Ser (no shared) JDK Ser
    6.644.107,00 4.385.118,00 3.615.584,00 81.582,00 79.073,00

    (Tests done on I7/Win8, XEON/Linux scores slightly higher, msg size ~70 bytes for structs, ~60 bytes serialization).

    Slowest compared to fastest: factor of 82. The test highlights an issue not covered by micro-benchmarking: Encoding and Decoding should perform similar, as factual throughput is determined by Min(Encoding performance, Decoding performance). For unknown reasons JDK serialization manages to encode the message tested like 500_000 times per second, decoding performance is only 80_000 per second so in the test the receiver gets dropped quickly:



    ***** Stats for receive rate:   80351   per second *********
    ***** Stats for receive rate:   78769   per second *********
    SUB-ud4q has been dropped by PUB-9afs on service 1
    fatal, could not keep up. exiting

    (Creating backpressure here probably isn’t the right way to address the issue 😉  )

    Conclusion:

    • a fast serialization allows for a level of abstraction in distributed applications impossible if serialization implementation is either
      – too slow
      – incomplete. E.g. cannot handle any serializable object graph
      – requires manual coding/adaptions. (would put many restrictions on actor message types, Futures, Spore’s, Maintenance nightmare)
    • Low Level utilities like Unsafe enable different representations of data resulting in extraordinary throughput or guaranteed latency boundaries (allocation free main path) for particular workloads. These are impossible to achieve by a large margin with JDK’s public tool set.
    • In distributed systems, communication performance is of fundamental importance. Removing Unsafe is  not the biggest fish to fry looking at the numbers above .. JSON or XML won’t fix this ;-).
    • While the HotSpot VM has reached an extraordinary level of performance and reliability, CPU is wasted in some parts of the JDK like there’s no tomorrow. Given we are living in the age of distributed applications and data, moving stuff over the wire should be easy to achieve (not manually coded) and as fast as possible. 
    Addendum: bounded latency

    A quick Ping Pong RTT latency benchmark showing that java can compete with C solutions easily, as long the main path is allocation free and techniques like described above are employed:

    [credits: charts+measurement done with HdrHistogram]

    This is an “experiment” rather than a benchmark (so do not read: ‘Proven: Java faster than C’), it shows low-level-Java can compete with C in at least this low-level domain.
    Of course its not exactly idiomatic Java code, however its still easier to handle, port and maintain compared to a JNI or pure C(++) solution. Low latency C(++) code won’t be that idiomatic either 😉

    About me: I am a solution architect freelancing at an exchange company in the area of realtime GUIs, middleware, and low latency CEP (Complex Event Processing).
    I am blogging at http://java-is-the-new-c.blogspot.de/,
    hacking at https://github.com/RuedigerMoeller.

    This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

    Doing microservices with micro-infra-spring

    We’ve been working at 4financeit for last couple of months on some open source solutions for microservices. I will be publishing some articles related to microservices and our tools and this is the first of (hopefully) many that I will write in the upcoming weeks (months?) on Too much coding blog.

    This article will be an introduction to the micro-infra-spring library showing how you can quickly set up a microservice using our tools.

    Introduction 

    Before you start it is crucial to remember that it’s not enough to just use our tools to have a microservice. You can check out my slides about microservices and issues that we have dealt with while adopting them at 4financeit.

    4financeit microservices 12.2014 at Lodz JUG from Marcin Grzejszczak

    Here you can find my video where I talk about microservice at 4finance (it’s from 19.09.2014 so it’s pretty outdated)


    Also it’s worth checking out the articles of Martin Fowler about microservices, Todd Hoff’s – Microservices not a free lunch! or The Strengths and Weaknesses of Microservices by Abel Avram’s.

    Is monolith bad?

    No it isn’t! The most important thing to remember when starting with microservices is that it will complicate your life in terms of operations, metrics, deployment and testing. Of course it does bring plenty of benefits but if you are unsure of what to pick – monolith or microservices then my advice to use is to go the monolith way.

    All the benefits of microservices like code autonomy, doing one thing well, getting ridd of pacakge dependencies can be also achieved in the monolithic code thus try to write your applications with such approaches and your life will get simpler for sure. How to achieve that? That’s complicated but here are a couple of hints that I can give you:

    • try to do DDD. No, you don’t have DDD when your entities have methods. Try to use concepts of aggregate roots
    • try not to make dependencies on packages from different roots. If you have two different bounded context like com.blogspot.toomuchcoding.client and com.blogspot.toomuchcoding.loan – go via tight cohesion and low coupling – emit events, call REST endpoint, send JMS messages or talk via strictly defined API. Do not reuse internals of those packages – take a look at the next point that deals with encapsulation
    • take your highscool notes and read about encapsulation again. Most of us make the mistake of thinking that if we make a field private and add an accessor to it then we have encapsulation. That’s not true! I really like the example of Slawek Sobotka (article in polish) who shows an example of common approach to encapsulation:

      human.getStomach().getBowls().getContent().add(new Sausage())

      instead of

      human.eat(new Sausauge())

    • add to your IDE class generation template that you want your new classes to be package scoped by default – what should be publicly available are interfaces and really limited number of classes
    • start doing what’s crucial in terms of tracking microservice requests and measuring business and technical data in your own application! Gather metrics, set up correlation ids for your messages, add service discovery if you have multiple monoliths.

    I’m a hipster – I want microservices!

    Let’s assume that you know what you are doing, you evaluated all pros and cons and you want to go down the microservice way. You have a devops culture there in your company and people are eager to start working on multiple codebases. How to start? Pick our tools and you won’t regret it 😉

    Clone a repo and get to work

    We have set up a working template on Github with UI – boot-microservice-gui and without it – boot-microservice. If you clone our repo and start working with it you get a service that:
    • uses micro-infra-spring library
    • is written in Groovy
    • uses Spring Boot
    • is built with Gradle (set up for 4finance – but that’s really easy to change)
    • is JDK8 compliant
    • contains an example of a business scenario
    what you just have to do is:
    • check out the slides above to see our approach to microservices
    • remove the packages com/ofg/twitter from src/main and src/test
    • alter microservice.json to support your requirements
    • write your code!
    Why should you use our repo?
    • you don’t have to set up anything – we’ve already done it for you
    • the time required to start developing a feature is close to zero

    Aren’t we duplicating Spring Cloud?

    In fact we’re not. We’re using it in our libraries ourselves (right now for property storage in Git repository). We have some different approaches to service discovery for instance but in general we are extending Spring Cloud’s features by:

    Conclusions

    If you want to go down the microservice way you have to be well aware of the issues related to that approach. If you know what you’re doing you can use our libraries and our microservice templates to have a fast start into feature development. 

    What’s next

    On my blog at toomuchcoding.blogspot.com I’ll write about different features of the micro-infra-spring library with more emphasis on configuration on specific features that are not that well known but equally cool as the rest 😉 Also I’ll write some articles on how we approached splitting the monolith but you’ll have to wait for that some time 😉

    This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

    How is Java / JVM built ? Adopt OpenJDK is your answer!

    Introduction & history
    As some of you may already know, starting with Java 7, OpenJDK is the Reference Implementation (RI) to Java. The below time line gives you an idea about the history of OpenJDK:
    OpenJDK history (2006 till date)
    If you have wondered about the JDK or JRE binaries that you download from vendors like Oracle, Red Hat, etcetera, then the clue is that these all stem from OpenJDK. Each vendor then adds some extra artefacts that are not open source yet due to security, proprietary or other reasons.


    What is OpenJDK made of ?
    OpenJDK is made up of a number of repositories, namely corba, hotspot, jaxp, jaxws, jdk, langtools, and nashorn. Between OpenjJDK8 and OpenJDK9 there have been no new repositories introduced, but lots of new changes and restructuring, primarily due to Jigsaw – the modularisation of Java itself [2] [3] [4] [5].
    repo composition, language breakdown (metrics are estimated)
    Recent history
    OpenJDK Build Benchmarks – build-infra (Nov 2011) by Fredrik Öhrström, ex-Oracle, OpenJDK hero!

    Fredrik Öhrström visited the LJC [16] in November 2011 where he showed us how to build OpenJDK on the three major platforms, and also distributed a four page leaflet with the benchmarks of the various components and how long they took to build. The new build system and the new makefiles are a result  of the build system being re-written (build-infra). 


    Below are screen-shots of the leaflets, a good reference to compare our journey:

    Build Benchmarks page 2 [26]

    How has Java the language and platform built over the years ?

    Java is built by bootstrapping an older (previous) version of Java – i.e. Java is built using Java itself as its building block. Where older components are put together to create a new component which in the next phase becomes the building block. A good example of bootstrapping can be found at Scheme from Scratch [6] or even on Wikipedia [7].


    OpenJDK8 [8] is compiled and built using JDK7, similarly OpenJDK9 [9] is compiled and built using JDK8. In theory OpenJDK8 can be compiled using the images created from OpenJDK8, similarly for OpenJDK9 using OpenJDK9. Using a process called bootcycle images – a JDK image of OpenJDK is created and then using the same image, OpenJDK is compiled again, which can be accomplished using a make command option:


    $ make bootcycle-images       # Build images twice, second time with newly built JDK


    make offers a number of options under OpenJDK8 and OpenJDK9, you can build individual components or modules by naming them, i.e.


    $ make [component-name] | [module-name]


    or even run multiple build processes in parallel, i.e.


    $ make JOBS=<n>                 # Run <n> parallel make jobs


    Finally install the built artefact using the install option, i.e.


    $ make install


    Some myths busted
    OpenJDK or Hotspot to be more specific isn’t completely written in C/C++, a good part of the code-base is good ‘ole Java (see the composition figure above). So you don’t have to be a hard-core developer to contribute to OpenJDK. Even the underlying C/C++ code code-base isn’t scary or daunting to look at. For example here is an extract of a code snippet from vm/memory/universe.cpp in the HotSpot repo –
    .
    .
    .
    Universe::initialize_heap()

    if (UseParallelGC) {
       #ifndef SERIALGC
       Universe::_collectedHeap = new ParallelScavengeHeap();
       #else // SERIALGC
           fatal(UseParallelGC not supported in this VM.);
       #endif // SERIALGC

    } else if (UseG1GC) {
       #ifndef SERIALGC
       G1CollectorPolicy* g1p = new G1CollectorPolicy();
       G1CollectedHeap* g1h = new G1CollectedHeap(g1p);
       Universe::_collectedHeap = g1h;
       #else // SERIALGC
           fatal(UseG1GC not supported in java kernel vm.);
       #endif // SERIALGC

    } else {
       GenCollectorPolicy* gc_policy;

       if (UseSerialGC) {
           gc_policy = new MarkSweepPolicy();
       } else if (UseConcMarkSweepGC) {
           #ifndef SERIALGC
           if (UseAdaptiveSizePolicy) {
               gc_policy = new ASConcurrentMarkSweepPolicy();
           } else {
               gc_policy = new ConcurrentMarkSweepPolicy();
           }
           #else // SERIALGC
               fatal(UseConcMarkSweepGC not supported in this VM.);
           #endif // SERIALGC
       } else { // default old generation
           gc_policy = new MarkSweepPolicy();
       }

       Universe::_collectedHeap = new GenCollectedHeap(gc_policy);
    }
    .
    .
    .
    (please note that the above code snippet might have changed since published here)
    The things that appears clear from the above code-block are, we are looking at how pre-compiler notations are used to create Hotspot code that supports a certain type of GC i.e. Serial GC or Parallel GC. Also the type of GC policy is selected in the above code-block when one or more GC switches are toggled i.e. UseAdaptiveSizePolicy when enabled selects the Asynchronous Concurrent Mark and Sweep policy. In case of either Use Serial GC or Use Concurrent Mark Sweep GC are not selected, then the GC policy selected is Mark and Sweep policy. All of this and more is pretty clearly readable and verbose, and not just nicely formatted code that reads like English.


    Further commentary can be found in the section called Deep dive Hotspot stuff in the Adopt OpenJDK Intermediate & Advance experiences [12] document.


    Steps to build your own JDK or JRE
    Earlier we mentioned about JDK and JRE images – these are no longer only available to the big players in the Java world, you and I can build such images very easily. The steps for the process have been simplified, and for a quick start see the Adopt OpenJDK Getting Started Kit [11] and Adopt OpenJDK Intermediate & Advance experiences [12] documents. For detailed version of the same steps, please see the Adopt OpenJDK home page [13]. Basically building a JDK image from the OpenJDK code-base boils down to the below commands:


    (setup steps have been made brief and some commands omitted, see links above for exact steps)

    $ hg clone http://hg.openjdk.java.net/jdk8/jdk8 jdk8  (a)…OpenJDK8
    or
    $ hg clone http://hg.openjdk.java.net/jdk9/jdk9 jdk9  (a)…OpenJDK9

    $ ./get_source.sh                                    (b)
    $ bash configure                                      (c)
    $ make clean images                                   (d)

    (setup steps have been made brief and some commands omitted, see links above for exact steps)

    To explain what is happening at each of the steps above:
    (a) We clone the openjdk mercurial repo just like we would using git clone ….
    (b) Once we have step (a) completed, we change into the folder created, and run the get_source.sh command, which is equivalent to a git fetch or a git pull, since the step (a) only brings down base files and not all of the files and folders.
    (c) Here we run a script that checks for and creates the configuration needed to do the compile and build process
    (d) Once step (c) is success we perform a complete compile, build and create JDK and JRE images from the built artefacts


    As you can see these are dead-easy steps to follow to build an artefact or JDK/JRE images [step (a) needs to be run only once].


    Benefits
    – contribute to the evolution and improvement of the Java the language & platform
    – learn about the internals of the language and platform
    – learn about the OS platform and other technologies whilst doing the above
    – get involved in F/OSS projects
    – stay on top the latest changes in the Java / JVM sphere
    – knowledge and experience that helps professionally but also these are not readily available from other sources (i.e. books, training, work-experience, university courses, etcetera).
    – advancement in career
    – personal development (soft skills and networking)


    Contribute
    Join the Adopt OpenJDK [13] and Betterrev [15] projects and contribute by giving us feedback about everything Java including these projects. Join the Adoption Discuss mailing list [14] and other OpenJDK related mailing lists to start with, these will keep you updated with latest progress and changes to OpenJDK. Fork any of the projects you see and submit changes via pull-requests.


    Thanks and support
    Adopt OpenJDK [13] and umbrella projects have been supported and progressed with help of JCP [21], the Openjdk team [22], JUGs like London Java Community [16], SouJava [17] and other JUGs in Brazil, a number of JUGs in Europe i.e. BGJUG (Bulgarian JUG) [18], BeJUG (Belgium JUG) [19], Macedonian JUG [20], and a number of other small JUGs. We hope in the coming time more JUGs and individuals would get involved. If you or your JUG wish to participate please get in touch.

    Credits
    Special thanks to +Martijn Verburg (incepted Adopt OpenJDK), +Richard Warburton+Oleg Shelajev+Mite Mitreski, +Kaushik Chaubal and +Julius G for helping improve the content and quality of this post, and sharing their OpenJDK experience with us.


    How to get started ?
    Join the Adoption Discuss mailing list [14], go to the Adopt OpenJDK home page [13] to get started, followed by referring to the Adopt OpenJDK Getting Started Kit [11] and Adopt OpenJDK Intermediate & Advance experiences [12] documents.


    Please share your comments here or tweet at @theNeomatrix369.


    Resources
    [8] OpenJDK8 
    [17] SouJava 


    This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

    How jOOQ Helps Pretend that Your Stored Procedures are a Part of Java

    In this year’s Java Advent Calendar, we’re thrilled to have been asked to feature a mini-series showing you a couple of advanced and very interesting topics that we’ve been working on when developing jOOQ.

    The series consists of:

    Don’t miss any of these!

    How jOOQ helps pretend that your stored procedures are a part of Java

    This article was originally published fully on the jOOQ blog

    Stored procedures are an interesting way to approach data processing. Some Java developers tend to stay clear of them for rather dogmatic reasons, such as:

    • They think that the database is the wrong place for business logic
    • They think that the procedural aspect of those languages is ill-suited for their domain

    But in practice, stored procedures are an excellent means of handling data manipulations simply for the fact that they can execute complex logic right where the data is. This completely removes all effects that network latency and bandwidth will have on your application, otherwise. As we’re looking into supporting SAP HANA for jOOQ 3.6, we can tell you that running jOOQ’s 10000 integration test queries connecting from a local machine to the cloud takes a lot longer. If you absolutely want to stay in Java land, then you better also deploy your Java application into the cloud, close to that database (SAP HANA obviously offers that feature). But much better than that, move some of the logic into the database!

    If you’re doing calculations on huge in-memory data sets, you should better get your code into that same memory, rather than shuffling around memory pieces between possibly separate physical memory addresses. Companies like Hazelcast essentially do the same, except that their in-memory database is written in Java, so you can also write your “stored procedure” in Java.

    With SQL databases, procedural SQL languages are king. And because of their tight integration with SQL, they’re much superior for the job than any Java based stored procedure architecture.

    I knoow, but JDBC’s CallableStatement… Arrrgh!

    Yes. As ever so often (and as mentioned before in our previous articles, one very important reason why many Java developers don’t like working with SQL is JDBC. Binding to a database via JDBC is extremely tedious and keeps us from working efficiently. Let’s have a look at a couple of PL/SQL binding examples:

    Assume we’re working on an Oracle-port of the popular Sakila database (originally created for MySQL). This particular Sakila/Oracle port was implemented by DB Software Laboratory and published under the BSD license.

    Here’s a partial view of that Sakila database.

    ERD created with vertabelo.comlearn how to use Vertabelo with jOOQ

    Now, let’s assume that we have an API in the database that doesn’t expose the above schema, but exposes a PL/SQL API instead. The API might look something like this:

    CREATE TYPE LANGUAGE_T AS OBJECT (
    language_id SMALLINT,
    name CHAR(20),
    last_update DATE
    );
    /

    CREATE TYPE LANGUAGES_T AS TABLE OF LANGUAGE_T;
    /

    CREATE TYPE FILM_T AS OBJECT (
    film_id int,
    title VARCHAR(255),
    description CLOB,
    release_year VARCHAR(4),
    language LANGUAGE_T,
    original_language LANGUAGE_T,
    rental_duration SMALLINT,
    rental_rate DECIMAL(4,2),
    length SMALLINT,
    replacement_cost DECIMAL(5,2),
    rating VARCHAR(10),
    special_features VARCHAR(100),
    last_update DATE
    );
    /

    CREATE TYPE FILMS_T AS TABLE OF FILM_T;
    /

    CREATE TYPE ACTOR_T AS OBJECT (
    actor_id numeric,
    first_name VARCHAR(45),
    last_name VARCHAR(45),
    last_update DATE
    );
    /

    CREATE TYPE ACTORS_T AS TABLE OF ACTOR_T;
    /

    CREATE TYPE CATEGORY_T AS OBJECT (
    category_id SMALLINT,
    name VARCHAR(25),
    last_update DATE
    );
    /

    CREATE TYPE CATEGORIES_T AS TABLE OF CATEGORY_T;
    /

    CREATE TYPE FILM_INFO_T AS OBJECT (
    film FILM_T,
    actors ACTORS_T,
    categories CATEGORIES_T
    );
    /

    You’ll notice immediately, that this is essentially just a 1:1 copy of the schema in this case modelled as Oracle SQL OBJECT and TABLE types, apart from the FILM_INFO_T type, which acts as an aggregate.

    Now, our DBA (or our database developer) has implemented the following API for us to access the above information:

    CREATE OR REPLACE PACKAGE RENTALS AS
    FUNCTION GET_ACTOR(p_actor_id INT) RETURN ACTOR_T;
    FUNCTION GET_ACTORS RETURN ACTORS_T;
    FUNCTION GET_FILM(p_film_id INT) RETURN FILM_T;
    FUNCTION GET_FILMS RETURN FILMS_T;
    FUNCTION GET_FILM_INFO(p_film_id INT) RETURN FILM_INFO_T;
    FUNCTION GET_FILM_INFO(p_film FILM_T) RETURN FILM_INFO_T;
    END RENTALS;
    /

    This, ladies and gentlemen, is how you can now…

    … tediously access the PL/SQL API with JDBC

    So, in order to avoid the awkward CallableStatement with its OUT parameter registration and JDBC escape syntax, we’re going to fetch a FILM_INFO_T record via a SQL statement like this:

    try (PreparedStatement stmt = conn.prepareStatement(
    "SELECT rentals.get_film_info(1) FROM DUAL");
    ResultSet rs = stmt.executeQuery()) {

    // STRUCT unnesting here...
    }

    So far so good. Luckily, there is Java 7’s try-with-resources to help us clean up those myriad JDBC objects. Now how to proceed? What will we get back from this ResultSet? A java.sql.Struct:

    while (rs.next()) {
    Struct film_info_t = (Struct) rs.getObject(1);

    // And so on...
    }

    Now, the brave ones among you would continue downcasting the java.sql.Struct to an even more obscure and arcane oracle.sql.STRUCT, which contains almost no Javadoc, but tons of deprecated additional, vendor-specific methods.

    For now, let’s stick with the “standard API”, though. Let’s continue navigating our STRUCT

    while (rs.next()) {
    Struct film_info_t = (Struct) rs.getObject(1);

    Struct film_t = (Struct) film_info_t.getAttributes()[0];
    String title = (String) film_t.getAttributes()[1];
    Clob description_clob = (Clob) film_t.getAttributes()[2];
    String description = description_clob.getSubString(1, (int) description_clob.length());

    Struct language_t = (Struct) film_t.getAttributes()[4];
    String language = (String) language_t.getAttributes()[1];

    System.out.println("Film : " + title);
    System.out.println("Description: " + description);
    System.out.println("Language : " + language);
    }

    This could go on and on. The pain has only started, we haven’t even covered arrays yet. The details can be seen here in the original article.

    Anyway. Now that we’ve finally achieved this, we can see the print output:

    Film       : ACADEMY DINOSAUR
    Description: A Epic Drama of a Feminist And a Mad
    Scientist who must Battle a Teacher in
    The Canadian Rockies
    Language : English
    Actors :
    PENELOPE GUINESS
    CHRISTIAN GABLE
    LUCILLE TRACY
    SANDRA PECK
    JOHNNY CAGE
    MENA TEMPLE
    WARREN NOLTE
    OPRAH KILMER
    ROCK DUKAKIS
    MARY KEITEL

    When will this madness stop?

    It’ll stop right here!

    So far, this article read like a tutorial (or rather: medieval torture) of how to deserialise nested user-defined types from Oracle SQL to Java (don’t get me started on serialising them again!)

    In the next section, we’ll see how the exact same business logic (listing Film with ID=1 and its actors) can be implemented with no pain at all using jOOQ and its source code generator. Check this out:

    // Simply call the packaged stored function from
    // Java, and get a deserialised, type safe record
    FilmInfoTRecord film_info_t = Rentals.getFilmInfo1(configuration, new BigInteger("1"));

    // The generated record has getters (and setters)
    // for type safe navigation of nested structures
    FilmTRecord film_t = film_info_t.getFilm();

    // In fact, all these types have generated getters:
    System.out.println("Film : " + film_t.getTitle());
    System.out.println("Description: " + film_t.getDescription());
    System.out.println("Language : " + film_t.getLanguage().getName());

    // Simply loop nested type safe array structures
    System.out.println("Actors : ");
    for (ActorTRecord actor_t : film_info_t.getActors()) {
    System.out.println(
    " " + actor_t.getFirstName()
    + " " + actor_t.getLastName());
    }

    System.out.println("Categories : ");
    for (CategoryTRecord category_t : film_info_t.getCategories()) {
    System.out.println(category_t.getName());
    }

    Is that it?

    Yes!

    Wow, I mean, this is just as though all those PL/SQL types and procedures / functions were actually part of Java. All the caveats that we’ve seen before are hidden behind those generated types and implemented in jOOQ, so you can concentrate on what you originally wanted to do. Access the data objects and do meaningful work with them. Not serialise / deserialise them!

    Not convinced yet?

    I told you not to get me started on serialising the types to JDBC. And I won’t, but here’s how to serialise the types to jOOQ, because that’s a piece of cake!

    Let’s consider this other aggregate type, that returns a customer’s rental history:

    CREATE TYPE CUSTOMER_RENTAL_HISTORY_T AS OBJECT (
    customer CUSTOMER_T,
    films FILMS_T
    );
    /

    And the full PL/SQL package specs:

    CREATE OR REPLACE PACKAGE RENTALS AS
    FUNCTION GET_ACTOR(p_actor_id INT) RETURN ACTOR_T;
    FUNCTION GET_ACTORS RETURN ACTORS_T;
    FUNCTION GET_CUSTOMER(p_customer_id INT) RETURN CUSTOMER_T;
    FUNCTION GET_CUSTOMERS RETURN CUSTOMERS_T;
    FUNCTION GET_FILM(p_film_id INT) RETURN FILM_T;
    FUNCTION GET_FILMS RETURN FILMS_T;
    FUNCTION GET_CUSTOMER_RENTAL_HISTORY(p_customer_id INT) RETURN CUSTOMER_RENTAL_HISTORY_T;
    FUNCTION GET_CUSTOMER_RENTAL_HISTORY(p_customer CUSTOMER_T) RETURN CUSTOMER_RENTAL_HISTORY_T;
    FUNCTION GET_FILM_INFO(p_film_id INT) RETURN FILM_INFO_T;
    FUNCTION GET_FILM_INFO(p_film FILM_T) RETURN FILM_INFO_T;
    END RENTALS;
    /

    So, when calling RENTALS.GET_CUSTOMER_RENTAL_HISTORY we can find all the films that a customer has ever rented. Let’s do that for all customers whose FIRST_NAME is “JAMIE”, and this time, we’re using Java 8:

    // We call the stored function directly inline in
    // a SQL statement
    dsl().select(Rentals.getCustomer(
    CUSTOMER.CUSTOMER_ID
    ))
    .from(CUSTOMER)
    .where(CUSTOMER.FIRST_NAME.eq("JAMIE"))

    // This returns Result<Record1<CustomerTRecord>>
    // We unwrap the CustomerTRecord and consume
    // the result with a lambda expression
    .fetch()
    .map(Record1::value1)
    .forEach(customer -> {
    System.out.println("Customer : ");
    System.out.println("- Name : " + customer.getFirstName() + " " + customer.getLastName());
    System.out.println("- E-Mail : " + customer.getEmail());
    System.out.println("- Address : " + customer.getAddress().getAddress());
    System.out.println(" " + customer.getAddress().getPostalCode() + " " + customer.getAddress().getCity().getCity());
    System.out.println(" " + customer.getAddress().getCity().getCountry().getCountry());

    // Now, lets send the customer over the wire again to
    // call that other stored procedure, fetching his
    // rental history:
    CustomerRentalHistoryTRecord history =
    Rentals.getCustomerRentalHistory2(dsl().configuration(), customer);

    System.out.println(" Customer Rental History : ");
    System.out.println(" Films : ");

    history.getFilms().forEach(film -> {
    System.out.println(" Film : " + film.getTitle());
    System.out.println(" Language : " + film.getLanguage().getName());
    System.out.println(" Description : " + film.getDescription());

    // And then, let's call again the first procedure
    // in order to get a film's actors and categories
    FilmInfoTRecord info =
    Rentals.getFilmInfo2(dsl().configuration(), film);

    info.getActors().forEach(actor -> {
    System.out.println(" Actor : " + actor.getFirstName() + " " + actor.getLastName());
    });

    info.getCategories().forEach(category -> {
    System.out.println(" Category : " + category.getName());
    });
    });
    });

    … and a short extract of the output produced by the above:

    Customer  : 
    - Name : JAMIE RICE
    - E-Mail : JAMIE.RICE@sakilacustomer.org
    - Address : 879 Newcastle Way
    90732 Sterling Heights
    United States
    Customer Rental History :
    Films :
    Film : ALASKA PHANTOM
    Language : English
    Description : A Fanciful Saga of a Hunter
    And a Pastry Chef who must
    Vanquish a Boy in Australia
    Actor : VAL BOLGER
    Actor : BURT POSEY
    Actor : SIDNEY CROWE
    Actor : SYLVESTER DERN
    Actor : ALBERT JOHANSSON
    Actor : GENE MCKELLEN
    Actor : JEFF SILVERSTONE
    Category : Music
    Film : ALONE TRIP
    Language : English
    Description : A Fast-Paced Character
    Study of a Composer And a
    Dog who must Outgun a Boat
    in An Abandoned Fun House
    Actor : ED CHASE
    Actor : KARL BERRY
    Actor : UMA WOOD
    Actor : WOODY JOLIE
    Actor : SPENCER DEPP
    Actor : CHRIS DEPP
    Actor : LAURENCE BULLOCK
    Actor : RENEE BALL
    Category : Music

    If you’re using Java and PL/SQL…

    … then you should click on the below banner and download the free trial right now to experiment with jOOQ and Oracle:

    The Oracle port of the Sakila database is available from this URL for free, under the terms of the BSD license:

    https://github.com/jOOQ/jOOQ/tree/master/jOOQ-examples/Sakila/oracle-sakila-db

    Finally, it is time to enjoy writing PL/SQL again!

    And things get even better!

    jOOQ is free and Open Source for use with Open Source databases, and it offers commercial licensing for use with commercial databases. So, if you’re using Firebird, MySQL, or PostgreSQL, you can leverage all your favourite database’s procedural SQL features and bind them easily to Java for free!

    For more information about jOOQ or jOOQ’s DSL API, consider these resources:

    That’s it with this year’s mini-series on jOOQ. Have a happy Holiday season!
    This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

    How jOOQ Allows for Fluent Functional-Relational Interactions in Java 8

    In this year’s Java Advent Calendar, we’re thrilled to have been asked to feature a mini-series showing you a couple of advanced and very interesting topics that we’ve been working on when developing jOOQ.

    The series consists of:

    Don’t miss any of these!

    How jOOQ allows for fluent functional-relational interactions in Java 8

    In yesterday’s article, we’ve seen How jOOQ Leverages Generic Type Safety in its DSL when constructing SQL statements. Much more interesting than constructing SQL statements, however, is executing them.

    Yesterday, we’ve seen a sample PL/SQL block that reads like this:

    BEGIN
    FOR rec IN (
    SELECT first_name, last_name FROM customers
    UNION
    SELECT first_name, last_name FROM staff
    )
    LOOP
    INSERT INTO people (first_name, last_name)
    VALUES rec.first_name, rec.last_name;
    END LOOP;
    END;

    And you won’t be surprised to see that the exact same thing can be written in Java with jOOQ:

    for (Record2<String, String> rec : 
    dsl.select(CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME).from(CUSTOMERS)
    .union(
    select(STAFF.FIRST_NAME, STAFF.LAST_NAME ).from(STAFF))
    ) {
    dsl.insertInto(PEOPLE, PEOPLE.FIRST_NAME, PEOPLE.LAST_NAME)
    .values(rec.getValue(CUSTOMERS.FIRST_NAME), rec.getValue(CUSTOMERS.LAST_NAME))
    .execute();
    }

    This is a classic, imperative-style PL/SQL inspired approach at iterating over result sets and performing actions 1-1.

    Java 8 changes everything!

    With Java 8, lambdas appeared, and much more importantly, Streams did, and tons of other useful features. The simplest way to migrate the above foreach loop to Java 8’s “callback hell” would be the following

    dsl.select(CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME).from(CUSTOMERS)
    .union(
    select(STAFF.FIRST_NAME, STAFF.LAST_NAME ).from(STAFF))
    .forEach(rec -> {
    dsl.insertInto(PEOPLE, PEOPLE.FIRST_NAME, PEOPLE.LAST_NAME)
    .values(rec.getValue(CUSTOMERS.FIRST_NAME), rec.getValue(CUSTOMERS.LAST_NAME))
    .execute();
    }

    This is still very simple. How about this. Let’s fetch a couple of records from the database, stream them, map them using some sophisticated Java function, reduce them into a batch update statement! Whew… here’s the code:

    dsl.selectFrom(BOOK)
    .where(BOOK.ID.in(2, 3))
    .orderBy(BOOK.ID)
    .fetch()
    .stream()
    .map(book -> book.setTitle(book.getTitle().toUpperCase()))
    .reduce(
    dsl.batch(update(BOOK).set(BOOK.TITLE, (String) null).where(BOOK.ID.eq((Integer) null))),
    (batch, book) -> batch.bind(book.getTitle(), book.getId()),
    (b1, b2) -> b1
    )
    .execute();

    Awesome, right? Again, with comments

    // Here, we simply select a couple of books from the database
    dsl.selectFrom(BOOK)
    .where(BOOK.ID.in(2, 3))
    .orderBy(BOOK.ID)
    .fetch()

    // Now, we stream the result as a Java 8 Stream
    .stream()

    // Now we map all book titles using the "sophisticated" Java function
    .map(book -> book.setTitle(book.getTitle().toUpperCase()))

    // Now, we reduce the books into a batch update statement...
    .reduce(

    // ... which is initialised with empty bind variables
    dsl.batch(update(BOOK).set(BOOK.TITLE, (String) null).where(BOOK.ID.eq((Integer) null))),

    // ... and then we bind each book's values to the batch statement
    (batch, book) -> batch.bind(book.getTitle(), book.getId()),

    // ... this is just a dummy combiner function, because we only operate on one batch instance
    (b1, b2) -> b1
    )

    // Finally, we execute the produced batch statement
    .execute();

    Awesome, right? Well, if you’re not too functional-ish, you can still resort to the “old ways” using imperative-style loops. Perhaps, your coworkers might prefer that:

    BatchBindStep batch = dsl.batch(update(BOOK).set(BOOK.TITLE, (String) null).where(BOOK.ID.eq((Integer) null))),

    for (BookRecord book :
    dsl.selectFrom(BOOK)
    .where(BOOK.ID.in(2, 3))
    .orderBy(BOOK.ID)
    ) {
    batch.bind(book.getTitle(), book.getId());
    }

    batch.execute();

    So, what’s the point of using Java 8 with jOOQ?

    Java 8 might change a lot of things. Mainly, it changes the way we reason about functional data transformation algorithms. Some of the above ideas might’ve been a bit over the top. But the principal idea is that whatever is your source of data, if you think about that data in terms of Java 8 Streams, you can very easily transform (map) those streams into other types of streams as we did with the books. And nothing keeps you from collecting books that contain changes into batch update statements for batch execution.

    Another example is one where we claimed that Java 8 also changes the way we perceive ORMs. ORMs are very stateful, object-oriented things that help manage database state in an object-graph representation with lots of nice features like optimistic locking, dirty checking, and implementations that support long conversations. But they’re quite terrible at data transformation. First off, they’re much much inferior to SQL in terms of data transformation capabilities. This is topped by the fact that object graphs and functional programming don’t really work well either.

    With SQL (and thus with jOOQ), you’ll often stay on a flat tuple level. Tuples are extremely easy to transform. The following example shows how you can use an H2 database to query for INFORMATION_SCHEMA meta information such as table names, column names, and data types, collect those information into a data structure, before mapping that data structure into new CREATE TABLE statements:

    DSL.using(c)
    .select(
    COLUMNS.TABLE_NAME,
    COLUMNS.COLUMN_NAME,
    COLUMNS.TYPE_NAME
    )
    .from(COLUMNS)
    .orderBy(
    COLUMNS.TABLE_CATALOG,
    COLUMNS.TABLE_SCHEMA,
    COLUMNS.TABLE_NAME,
    COLUMNS.ORDINAL_POSITION
    )
    .fetch() // jOOQ ends here
    .stream() // Streams start here
    .collect(groupingBy(
    r -> r.getTableName(),
    LinkedHashMap::new,
    mapping(
    r -> r,
    toList()
    )
    ))
    .forEach(
    (table, columns) -> {
    // Just emit a CREATE TABLE statement
    System.out.println(
    "CREATE TABLE " + table + " (");

    // Map each "Column" type into a String
    // containing the column specification,
    // and join them using comma and
    // newline. Done!
    System.out.println(
    columns.stream()
    .map(col -> " " + col.getName() +
    " " + col.getType())
    .collect(Collectors.joining(",n"))
    );

    System.out.println(");");
    }
    );

    The above statement will produce something like the following SQL script:

    CREATE TABLE CATALOGS(
    CATALOG_NAME VARCHAR
    );
    CREATE TABLE COLLATIONS(
    NAME VARCHAR,
    KEY VARCHAR
    );
    CREATE TABLE COLUMNS(
    TABLE_CATALOG VARCHAR,
    TABLE_SCHEMA VARCHAR,
    TABLE_NAME VARCHAR,
    COLUMN_NAME VARCHAR,
    ORDINAL_POSITION INTEGER,
    COLUMN_DEFAULT VARCHAR,
    IS_NULLABLE VARCHAR,
    DATA_TYPE INTEGER,
    CHARACTER_MAXIMUM_LENGTH INTEGER,
    CHARACTER_OCTET_LENGTH INTEGER,
    NUMERIC_PRECISION INTEGER,
    NUMERIC_PRECISION_RADIX INTEGER,
    NUMERIC_SCALE INTEGER,
    CHARACTER_SET_NAME VARCHAR,
    COLLATION_NAME VARCHAR,
    TYPE_NAME VARCHAR,
    NULLABLE INTEGER,
    IS_COMPUTED BOOLEAN,
    SELECTIVITY INTEGER,
    CHECK_CONSTRAINT VARCHAR,
    SEQUENCE_NAME VARCHAR,
    REMARKS VARCHAR,
    SOURCE_DATA_TYPE SMALLINT
    );

    That’s data transformation! If you’re as excited as we are, read on in this article how this example works exactly.

    Conclusion

    Java 8 has changed everything in the Java ecosystem. Finally, we can implement functional, transformative algorithms easily using Streams and lambda expressions. SQL is also a very functional and transformative language. With jOOQ and Java 8, you can extend data transformation directly from your type safe SQL result into Java data structures, back into SQL. These things aren’t possible with JDBC. These things weren’t possible prior to Java 8.

    jOOQ is free and Open Source for use with Open Source databases, and it offers commercial licensing for use with commercial databases.

    For more information about jOOQ or jOOQ’s DSL API, consider these resources:

    Stay tuned for tomorrow’s article “How jOOQ helps pretend that your stored procedures are a part of Java”
    This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

    How jOOQ Leverages Generic Type Safety in its DSL

    In this year’s Java Advent Calendar, we’re thrilled to have been asked to feature a mini-series showing you a couple of advanced and very interesting topics that we’ve been working on when developing jOOQ.

    The series consists of:

    Don’t miss any of these!

    How jOOQ leverages generic type safety in its DSL

    Few Java developers are aware of this, but SQL is a very type safe language. In the Java ecosystem, if you’re using JDBC, you’re operating on dynamically constructed SQL strings, which are sent to the server for execution – or failure. Some IDEs may have started to be capable of introspecting parts of your static SQL, but often you’re concatenating predicates to form a very dynamic query:

    String sql = "SELECT a, b, c FROM table WHERE 1 = 1";

    if (someCondition)
    sql += " AND id = 3";

    if (someOtherCondition)
    sql += " AND value = 42";

    These concatenations quickly turn nasty and are one of the reasons why Java developers don’t really like SQL

    SQL as written via JDBC. Image (c) by Greg Grossmeier. License CC-BY-SA 2.0

    But interestingly, PL/SQL or T-SQL developers never complain about SQL in this way. In fact, they feel quite the opposite. Look at how SQL is nicely embedded in a typical PL/SQL block:

    BEGIN

    -- The record type of "rec" is inferred by the compiler
    FOR rec IN (

    -- This compiles only when I have matching
    -- degrees and types of both UNION subselects!
    SELECT first_name, last_name FROM customers
    UNION
    SELECT first_name, last_name FROM staff
    )
    LOOP

    -- This compiles only if rec really has
    -- first_name and last_name columns
    INSERT INTO people (first_name, last_name)

    -- Obviously, VALUES must match the above target table
    VALUES (rec.first_name, rec.last_name);
    END LOOP;
    END;

    Now, we can most certainly discuss syntax. Whether you like SQL’s COBOLesque syntax or not is a matter of taste and a matter of habit, too. But one thing is clear, SQL is absolutely type safe, and most sane people would consider that a very good thing. Read The Inconvenient Truth About Dynamic vs. Static Typing for more details.

    The same can be achieved in Java!

    JDBC’s lack of type safety is a brilliant feature for the low-level API that JDBC is. At some point, we need an API that can simply send SQL strings over the wire without knowing anything about the wire protocol, and retrieve back cursors of arbitrary / unknown type. However, if we don’t execute our SQL directly via JDBC, but maintain a type safe SQL AST (Abstract Syntax Tree) prior to query execution, then we might actually anticipate the returned type of our statements.

    jOOQ’s DSL API (Domain-specific language) works exactly like that. When you create SQL statements with jOOQ, you’re implicitly creating an AST both for your Java compiler, but also for your runtime environment. Here’s how that works:

    DSL.using(configuration)
    .select(CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME).from(CUSTOMERS)
    .union(
    select(STAFF.FIRST_NAME, STAFF.LAST_NAME ).from(STAFF))
    .fetch();

    If we look closely at what the above query really does, we’ll see that we’re calling one of several overloaded select() methods on jOOQ’s DSLContext class, namely DSLContext.select(Field, Field), the one that takes two argument columns.

    The whole API looks like this, and we’ll see immediately after why this is so useful:

    <T1> SelectSelectStep<Record1<T1>> 
    select(Field<T1> field1);
    <T1, T2> SelectSelectStep<Record2<T1, T2>>
    select(Field<T1> field1, Field<T2> field2);
    <T1, T2, T3> SelectSelectStep<Record3<T1, T2, T3>>
    select(Field<T1> field1, Field<T2> field2, Field<T3> field3);
    // and so on...

    So, by explicitly passing two columns to the select() method, you have chosen the second one of the above methods that returns a DSL type that is parameterised with Record2, or more specifically, with Record2<String, String>. Yes, the String parameter bindings are inferred from the very columns that we passed to the select() call, because jOOQ’s code generator reverse-engineers your database schema and generates those classes for you.

    The generated Customers class really looks like this (simplified):

    // All table references are listed here:
    class Tables {
    Customers CUSTOMERS = new Customers();
    Staff STAFF = new Staff();
    }

    // All tables have an individual class each, with columns inside:
    class Customers {
    final Field<String> FIRST_NAME = ...
    final Field<String> LAST_NAME = ...
    }

    As you can see, all type information is already available to you, automatically, as you have defined those types only once in the database. No need to define them again in Java.

    Generic type information is ubiquitous

    The interesting part is the UNION. The union() method on the DSL API simply looks like this:

    public interface SelectUnionStep<R extends Record> {
    SelectOrderByStep<R> union(Select<? extends R> select);
    }

    If we go back to our statement, we can see that the type of the object upon which we call union() is really this type:

    SelectUnionStep<Record2<String, String>>

    … thus, the method union() that we’re calling is really expecting an argument of this type:

    union(Select<? extends Record2<String, String>> select);

    … which essentially means that we’ll get a compilation error if we don’t provide two string columns also in the second subselect:

    DSL.using(configuration)
    .select(CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME).from(CUSTOMERS)
    .union(
    // ^^^^^ doesn't compile, wrong argument type!
    select(STAFF.FIRST_NAME).from(STAFF))
    .fetch();

    or also:

    DSL.using(configuration)
    .select(CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME).from(CUSTOMERS)
    .union(
    // ^^^^^ doesn't compile, wrong argument type!
    select(STAFF.FIRST_NAME, STAFF.DATE_OF_BIRTH).from(STAFF))
    .fetch();

    Static type checking helps finding bugs early

    … indeed! All of the above bugs can be found at compile-time because your Java compiler will not accept the wrong SQL statements. When writing dynamic SQL, this can be incredibly subtle, as the different UNION subselects may not be created all at the same place. You may have a complex DAO that generates the SQL across several methods. With this kind of generic type safety, you can continue to do so, safely.

    As mentioned before, this extends through the whole API. Check out…

    IN predicates

    This compiles:

    // Get all customers whose first name corresponds to a staff first name
    DSL.using(configuration)
    .select().from(CUSTOMERS)
    .where(CUSTOMERS.FIRST_NAME.in(
    select(STAFF.FIRST_NAME).from(STAFF)
    ))
    .fetch();

    This doesn’t compile:

    DSL.using(configuration)
    .select().from(CUSTOMERS)
    .where(CUSTOMERS.FIRST_NAME.in(
    // ^^ wrong argument type!
    select(STAFF.FIRST_NAME, STAFF.LAST_NAME).from(STAFF)
    ))
    .fetch();

    But this compiles:

    // Get all customers whose first and last names both correspond
    // to a staff first and last names
    DSL.using(configuration)
    .select().from(CUSTOMERS)
    .where(row(CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME).in(
    select(STAFF.FIRST_NAME, STAFF.LAST_NAME).from(STAFF)
    ))
    .fetch();

    Notice the use of row() to construct a row value expression, an extremely useful but little known SQL feature.

    INSERT statements

    This compiles:

    DSL.using(configuration)
    .insertInto(CUSTOMERS, CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME)
    .values("John", "Doe")
    .execute();

    This doesn’t compile:

    DSL.using(configuration)
    .insertInto(CUSTOMERS, CUSTOMERS.FIRST_NAME, CUSTOMERS.LAST_NAME)
    .values("John")
    // ^^^^^^ Invalid number of arguments
    .execute();

    Conclusion

    Internal domain-specific languages can express a lot of type safety in Java, almost as much as the external language really implements. In the case of SQL – which is a very type safe language – this is particularly true and interesting.

    jOOQ has been designed to create as little cognitive friction as possible for any Java developer who wants to write embedded SQL in Java, i.e. the Java code will look and feel exactly like the SQL code that it represents. At the same time, jOOQ has been designed to offer as much compile-time type safety as possible in the Java language (or also in Scala, Groovy, etc.).

    jOOQ is free and Open Source for use with Open Source databases, and it offers commercial licensing for use with commercial databases.

    For more information about jOOQ or jOOQ’s DSL API, consider these resources:

    Stay tuned for tomorrow’s article “How jOOQ allows for fluent functional-relational interactions in Java 8”
    This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

    Managing Package Dependencies with Degraph

    A large part of the art of software development is keeping the complexity of a system as low as possible. But what is complexity anyway? While the exact semantics vary quite a bit, depending on who you ask, probably most agree that it has a lot to do with the number of parts in a system and their interactions.

    Consider a marble in space, i.e a planet, moon or star. Without any interaction this is as boring as a system can get. Nothing happens. If the marble moves, it keeps moving in exactly the same way. To be honest there isn’t even a way to determine if it is moving. Boooring.

    Add a second marble to the system and let them attract each other, like earth and moon. Now the system is a more interesting. The two objects circle each other if they aren’t too fast. Somewhat interesting.

    Now add a third object. In the general case things go so interesting that we can’t even predict what is going to happen. The whole system didn’t just became complex it became chaotic. You now have a three body problem In the general case this problem cannot be solved, i.e. we cannot predict what will happen with the system. But there are some special cases. Especially the case where two of the objects a very close to each other like earth and moon and the third one is so far away that the two first object behave just like one. In this case you approximate the system with two particle systems.

    But what has this to do with Java? This sounds more like physics.

    I think software development is similar in some aspects. A complete application is way to complicated to be understood as a whole. To fight this complexity we divide the system into parts (classes) that can be understood on their own, and that hide their inner complexity so that when we look at the larger picture we don’t have to worry about every single code line in a class, but only about the class as one entity. This is actually very similar to what physicists do with systems.

    But let’s look at the scale of things. The basic building block of software is the code line. And to keep the complexity in check we bundle code lines that work together in methods. How many code lines go into a single method varies, but it is in the order of 10 lines of code.
    Next you gather methods into classes. How many methods go into a single class? Typically in the order of 10 methods!

    And then? We bundle 100-10000 classes in a single jar! I hope I’m not the only one who thinks something is amiss.

    I’m not sure what comes out of project jigsaw, but currently Java only offers packages as a way to bundle classes. Package aren’t a powerful abstraction, yet it is the only one we have, so we better use it.

    Most teams do use packages, but not in a very well structured, but ad hoc way. The result is similar to trying to consider moon and sun as on part of the system, and the earth as the other part. The result might work, but it is probably as intuitive as Ptolemy’s planetary model. Instead decide on criteria how you want to differentiate your packages. I personally call them slicings, inspired by an article by Oliver Gierke. Possible slicings in order of importance are:

    • the deployable jar file the class should end up in
    • the use case / feature / part of the business model the class belongs to
    • the technical layer the class belongs to

    The packages this results in will look like this: <domain>.<deployable>.<domain part>.<layer>

    It should be easy to decide where a class goes. And it should also keep the packages at a reasonable size, even when you don’t use the separation by technical layer.

    But what do you gain from this? It is easier to find classes, but that’s about it. You need one more rule to make this really worth while: There must be no cyclic dependencies!

    This means, if a class in a package A references a class in package B no class in B may reference A. This also applies if the reference is indirect via multiple other packages. But that is still not enough. Slices should be cycle free as well, so if a domain part X references a different domain part Y, the reverse dependency must not exist!

    This will in deed put some rather strict rules on your package and dependency structure. The benefit of this is, that it becomes very flexible.

    Without such a structure splitting your project in multiple parts will probably be rather difficult. Ever tried to reuse part of an application in a different one, just to realize that you basically have to include most of the the application in order to get it to compile? Ever tried to deploy different parts of an application to different servers, just to realize you can’t? It certainly happend to me before I used the approach mentioned above. But with this more strict structure, the parts you may want to reuse, will almost on their own end up on the end of the dependency chain so you can take them and bundle them in their own jar, or just copy the code in a different project and have it compile in very short time.

    Also while trying to keep your packages and slices cycle free you’ll be forced to think hard, what each package involved is really about. Something that improved my code base considerably in many cases.

    So there is one problem left: Dependencies are hard to see. Without a tool, it is very difficult to keep a code base cycle free. Of course there are plenty of tools that check for cycles, but cleaning up these cycles is tough and the way most tools present these cycles doesn’t help very much. I think what one needs are two things:

    1. a simple test, that can run with all your other tests and fails when you create a dependency circle.
    2. a tool that visualizes all the dependencies between classes, while at the same time showing in which slice each class belongs.

    Surprise! I can recommend such a great tool: Degraph! (I’m the author, so I might be biased)

    You can write tests in JUnit like this:



    assertThat(
    classpath().including("de.schauderhaft.**")
    .printTo("degraphTestResult.graphml")
    .withSlicing("module", "de.schauderhaft.(*).*.**")
    .withSlicing("layer", "de.schauderhaft.*.(*).**"),
    is(violationFree())
    );

    The test will analyze everything in the classpath that starts with de.schauderhaft. It will slice the classes in two ways: By taking the third part of the package name and by taking the forth part of the package name. So a class name de.schauderhaft.customer.persistence.HibernateCustomerRepository ends up in the module customer and in the layer persistence. And it will make sure that modules, layers and packages are cycle free.

    And if it finds a dependency circle, it will create a graphml file, which you can open using the free graph editor yed. With a little layouting you get results like the following where the dependencies that result in circular dependencies are marked in red.

    Again for more details on how to achieve good usable layouts I have to refer to the documentation of Degraph.

    Also note that the graphs are colored mainly green with a little red, which nicely fits the season!

    This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!