Site icon JVM Advent

Java regular expression library benchmarks – 2015

While trying to get Java to #1 in the regexdna challenge for The Computer Language Benchmarks Game I was researching the performance of regular expression libraries for Java. The most recent website I could find was tusker.org from 2010. Hence I decided to redo the tests using the Java Microbenchmarking Harness and publish the results.
TL;DR: regular expressions are good for ad-hoc querying but if you have something performance sensitive, you should hand-code your solution (this doesn’t mean that you have to start from absolute zero – the Google Guava library has for example some nice utilities which can help in writing readable but also performant code).

And now, for some charts summarizing the performance – the test was run on an 64bit Ubuntu 15.10 machine with OpenJDK 1.8.0_66:

Small texts Large texts
Linear scale
Logarithmic scale

Observations:

The complete list of libraries used:

Library name and version (release year) Available in Maven Central License Average ops/second Average ops/second (large text) Passing tests
j.util.Pattern 1.8 (2015) no (comes with JRE) JRE license 19 689 22 144 5 out of 5
dk.brics.automaton.Automaton 1.11-8 (2011) yes BSD 2 600 225 115 374 276 2 out of 5
org.apache.regexp 1.4 (2005) yes Apache (?) 6 738 16 895 4 out of 5
com.stevesoft.pat.Regex 1.5.3 (2009) yes LGPL v3 4 191 859 4 out of 5
net.sourceforge.jregex 1.2_01 (2002) yes BSD 57 811 3 573 4 out of 5
kmy.regex.util.Regex 0.1.2 (2000) no Artistic License 217 803 38 184 5 out of 5
org.apache.oro.text.regex.Perl5Matcher 2.0.8 (2003) yes Apache 2.0 31 906 2383 4 out of 5
gnu.regexp.RE 1.1.4 (2005?) yes GPL (?) 11 848 1 509 4 out of 5
com.basistech.tclre.RePattern 0.13.6 (2015) yes Apache 2.0 11 598 43 3 out of 5
com.karneim.util.collection.regex.Pattern 1.1.1 (2005?) yes ? 2 out of 5
org.apache.xerces.impl.xpath.regex.RegularExpression 2.11.0 (2014) yes Apache 2.0 4 out of 5
com.ibm.regex.RegularExpression 1.0.2 (no longer available) no ?
RegularExpression.RE 1.1 (no longer available) no ?
gnu.rex.Rex ? (no longer available) no ?
monq.jfa.Regexp 1.1.1 (no longer available) no ?
com.ibm.icu.text.UnicodeSet (ICU4J) 56.1 (2015) yes ICU License

If you want to re-run the tests, check out the source code and run it as follows:

# we need to skip tests since almost all libraries fail a test or an other
mvn -Dmaven.test.skip=true clean package
# run the benchmarks
java -cp lib/jint.jar:target/benchmarks.jar net.greypanther.javaadvent.regex.RegexBenchmarks

Find the complete source for the benchmarks on GitHub: https://github.com/gpanther/regex-libraries-benchmarks

Author: Attila-Mihály Balázs

A programmer journeyman who appreciates working in great teams to deliver good fit solutions to customers.

Exit mobile version