I’m deeply, truly sorry everyone. I let you all down. I promised you all a great JVM advent, and not only did I fail to write my own article(s) – like last year – I also failed to give the support needed to others to publish their articles. I f*** up royally.
I beg your forgiveness and help: I’ve become disconnected from the JVM world as of late and also busy with life. As such I’m asking you: could you please take over the “running” of JVM Advent from me? It’s a good resource and it would be a shame to let it die just because of my incompetence. I’m happy to continue to pay for the domain and/or hosting but – as experience shows – I can’t dedicate the time it would deserve to coordinating with authors. If you are interested, please contact me at firstname.lastname@example.org.
To end on a happier note: most advent calendars had better stewards than me, so there were quite a lot successful initiatives this year:
Happy holidays everyone and looking forward to a better new year!
We’re gearing up for an other great Java advent season. Feel free to browse the archives for 2015, 2014, 2013 and 2012. Also, make sure to subscribe by one of the following ways to make sure that you don’t miss a single post:
Finally, to help pass the time until December 1st, check out VirtualJUG:
It’s just like your local Java User Group, but you don’t have to dress up to participate 🙂
PS: We are looking for contributors! If you’re considering writing an article for the 2016 edition, read the (short) contributing guidelines and join the author’s google group.
It seems that doing the wrap-up with almost a year delay is becoming a tradition on the JVM advent blog. I’m really hoping to do better this year – and I’m also always happy to get help 🙂
While trying to get Java to #1 in the regexdna challenge for The Computer Language Benchmarks Game I was researching the performance of regular expression libraries for Java. The most recent website I could find was tusker.org from 2010. Hence I decided to redo the tests using the Java Microbenchmarking Harness and publish the results.
TL;DR: regular expressions are good for ad-hoc querying but if you have something performance sensitive, you should hand-code your solution (this doesn’t mean that you have to start from absolute zero – the Google Guava library has for example some nice utilities which can help in writing readable but also performant code).
And now, for some charts summarizing the performance – the test was run on an 64bit Ubuntu 15.10 machine with OpenJDK 1.8.0_66:
- there is no “standard” for regular expressions, so different libraries can behave differently when given a particular regex and a particular string to match against – ie. one might say that it matches but the other might say that it doesn’t. For example, even though I used a very reduced set of testcases (5 regexes checked against 6 strings), only two of the libraries managed to match / not match them all correctly (one of them being java.util.Pattern).
it probably takes more than one try to get your regex right (tools like regexpal or The Regex Coach are very useful for experimenting)
the performance of a regex is hard to predict (and sometimes it can have exponential complexity based on the input length) – because of this you need to think twice if you accept a regular expression from arbitrary users on the Internet (like a search engine which would allow search by regular expressions for example)
none of the libraries seems to be in active development any more (in fact quite a few from the original list on tusker.org are now unavailable) and many of them are slower than the built-in j.u.Pattern, so if you use regexes that should probably be the first choice.
that said, the performance of both the hardware and JVM has been considerable, so if you are using one of these libraries, it is running generally an order of magnitude faster than it was five years ago. So there is no need to quickly replace working code (unless your profiler says that it is a problem :-))
watch out for calls to String.split in loops. While it has some optimization for particular cases (such as one-char regexes), you should almost always:
- see if you can use something like Splitter from Google Guava
- if you need a regular expression, at least pre-compile it outside of the loop
- the two surprises were dk.brics.automaton which outperformed everything else by several orders of magnitude, however:
- the last release was in 2011 and seems to be more an academic project
- it doesn’t support the same syntax as java.util.Pattern (but doesn’t give you a warning if you try to use a j.u.Pattern – it just won’t match the strings you think it should)
- doesn’t have an API as comfortable as j.u.Pattern (for example it’s missing replacements)
- the other surprise was kmy.regex.util.Regex, which – although not updated since 2000 – outperformed java.util.Pattern and passed all the tests (of which there weren’t admittedly many).
The complete list of libraries used:
If you want to re-run the tests, check out the source code and run it as follows:
# we need to skip tests since almost all libraries fail a test or an other
mvn -Dmaven.test.skip=true clean package
# run the benchmarks
java -cp lib/jint.jar:target/benchmarks.jar net.greypanther.javaadvent.regex.RegexBenchmarks
Find the complete source for the benchmarks on GitHub: https://github.com/gpanther/regex-libraries-benchmarks
The Java Advent has started (our first article of the year)! Enjoy 24 articles about a JVM related topic – one each day – written by the enthusiastic team at Java Advent HQ! Thank you for writing and thank you for reading.
To make sure that you don’t miss any of the posts, subscribe:
Also, if you like what you’re reading, be sure to share it with others who might be interested (be it on the social networks or in real-life).
Mite Mitreski posted a variety of different posts and we’re looking forward to having more of his articles.
Martijn Verburg is many things – the diabolical developer, CEO – jClarity, London JUG co-leader (LJC), Speaker, Author, Javaranch Mod, PCGen & Adopt OpenJDK / A-JSR Cat herder, Java Champion etc. He provided the year-end summaries about Java-land for 2012, 2013 and 2014.
Lukas Eder is Founder and CEO at Data Geekery, the company behind jOOQ, the “Minister of Bringing Sanity Back to Java / SQL” and recent Java champion (congrats Lukas!). He contributed four articles to Java Advent until now and we’re looking forward to having more of his articles:
Marcin Grzejszczak in an enthusiast of clean coding and good design, author of “Mockito Instant” and “Mockito Cookbook” books, contributor to several open source projects and many more things (check out his profile for more).
Nikita likes performance tuning of Java applications. His articles to date are: