JVM Advent

The JVM Programming Advent Calendar

Faster Maven builds

Builds require a few properties, chief among them reproducibility. I would consider speed to be low on the order of priorities. However, it’s also one of the most limiting factors to your release cycle: if your build takes T, you cannot release faster than each T. Hence, you’ll probably want to speed up your builds after you’ve reached a certain maturity level to enable more frequent releases.

I want to detail some techniques you can leverage to make your Maven builds faster in this article, outside and then inside of Docker.

Baseline

Since I want to propose techniques and evaluate their impact, we need a sample repository. I’ve chosen Hazelcast code samples because it provides a large enough multi-modules code base with many submodules; the exact commit is 448febd.

The rules are the following:

  • I run the command five times to avoid temporary issues
  • I execute mvn clean between each run to start from an empty target repository
  • All dependencies and plugins are already downloaded
  • I report the time that Maven displays in the console log:
    [INFO] -------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] -------------------------------------------------------
    [INFO] Total time:  22.456 s (Wall Clock)
    [INFO] Finished at: 2021-09-24T23:20:41+02:00
    [INFO] -------------------------------------------------------

Let’s start with our baseline, mvn test. The results are:

  • 02:00 min
  • 01:57 min
  • 01:58 min
  • 01:56 min
  • 01:58 min

Using all CPUs

By default, Maven uses a single thread. In the age of multicores, this is just waste. It’s possible to run parallel builds using multiple threads by setting an absolute number or a number relative to the number of available cores. For more information, please check the relevant documentation.

The more submodules that are not dependent on each other you have, i.e., Maven can build them in parallel, the better you’ll achieve with this technique. It fits our codebase very well.

We are going to use as many threads as there are available cores. The relevant command is mvn test -T 1C.

When the command starts, you should see the following message in the console:

Using the MultiThreadedBuilder implementation with a thread count of X
  • 51.487 s (Wall Clock)
  • 40.322 s (Wall Clock)
  • 52.468 s (Wall Clock)
  • 41.862 s (Wall Clock)
  • 41.699 s (Wall Clock)

The numbers are much better but with a higher variance.

Parallel test execution

Parallelization is an excellent technique. We can do the same regarding test execution. By default, the Maven Surefire plugin runs tests sequentially, but it’s possible to configure it to run tests in parallel. Please refer to the documentation for the whole set of options.

This approach is excellent if you’ve got a large number of units in each module. Note that your tests need to be independent of one another.

We will manually set the number of threads:

mvn test -Dparallel=all -DperCoreThreadCount=false -DthreadCount=16 #1 #2
  1. Configure Surefire to run both classes and methods in parallel
  2. Manual override the thread count to 16

Let’s run it:

  • 02:04 min
  • 02:03 min
  • 01:46 min
  • 01:52 min
  • 01:53 min

It seems that the cost of thread synchronization offsets the potential gain of running parallel tests.

Offline

Maven will check whether a SNAPSHOT dependency has a new “version” at every run. It means additional network roundtrips. We can prevent this check with the --offline option.

While you should avoid SNAPSHOT dependencies, it’s sometimes unavoidable, especially during development.

The command is mvn test -o, -o being the shortcut for --offline.

  • 01:46 min
  • 01:46 min
  • 01:47 min
  • 01:55 min
  • 01:44 min

The codebase has a considerable number of SNAPSHOT dependencies; hence offline speeds up the build significantly.

JVM parameters

Maven itself is a Java-based application. It means each run starts a new JVM. A JVM first interprets the bytecode and then analyze the workload and compiles the bytecode to native code accordingly: it means peak performance, but only after a (long) while. It’s great for long-running processes, not so much for command-line applications.

We will likely not reach the peak performance point in the context of builds since they are relatively short-lived, but we are still paying for the analysis cost. We can configure Maven to forego it by configuring the adequate JVM parameters. Several ways of configuring the JVM are available. The most straightforward way is to create a dedicated jvm.config configuration file in a .mvn subfolder in the project’s folder.

-XX:-TieredCompilation -XX:TieredStopAtLevel=1

Let’s now simply run mvn test:

  • 01:44 min
  • 01:44 min
  • 01:53 min
  • 01:53 min
  • 01:55 min

Maven daemon

The Maven daemon is a recent addition to the Maven ecosystem. It draws its inspiration from the Gradle daemon:

Gradle runs on the Java Virtual Machine (JVM) and uses several supporting libraries that require a non-trivial initialization time. As a result, it can sometimes seem a little slow to start. The solution to this problem is the Gradle Daemon: a long-lived background process that executes your builds much more quickly than would otherwise be the case. We accomplish this by avoiding the expensive bootstrapping process and leveraging caching by keeping data about your project in memory.

The Gradle team recognized early that a command-line tool was not the best usage of the JVM. To fix that, one keeps a JVM background process, the daemon, always up. It acts as a server while the CLI itself plays the role of the client.

As an additional benefit, such a long-running process loads classes only once (if they didn’t change between runs).

Once you have installed the software, you can run the daemon with the mvnd command instead of the standard mvn one. Here are the results with mvnd test:

  • 33.124 s (Wall Clock)
  • 33.114 s (Wall Clock)
  • 34.440 s (Wall Clock)
  • 32.025 s (Wall Clock)
  • 29.364 s (Wall Clock)

Note that the daemon uses multiple threads by default, with number of cores - 1.

Mixing and matching

We’ve seen several ways to speed up the build. What if we used them in conjunction?

Let’s first try with every technique we’ve seen so far in the same run:

mvnd test -Dparallel=all -DperCoreThreadCount=false -DthreadCount=16 -o #1 #2 #3 #4
  1. Use the Maven daemon
  2. Run the tests in parallel
  3. Don’t update SNAPSHOT dependencies
  4. Configure the JVM parameters as above via the jvm.config file – no need to set any option

The command returns the following results:

  • 27.061 s (Wall Clock)
  • 24.457 s (Wall Clock)
  • 24.853 s (Wall Clock)
  • 25.772 s (Wall Clock)

Thinking about it, the Maven daemon is a long-running process. For that reason, it stands to reason to let the JVM analyze and compile the bytecode to native code. We can thus remove the jvm.config file and re-run the above command. Results are:

  • 23.840 s (Wall Clock)
  • 26.589 s (Wall Clock)
  • 22.283 s (Wall Clock)
  • 23.788 s (Wall Clock)
  • 22.456 s (Wall Clock)

Now we can display the consolidated results:

  Baseline Parallel Build Parallel tests Offline JVM params Daemon Daemon + offline + parallel tests + parameters Daemon + offline + parallel tests
#1 (s) 120.00 51.00 128.00 106.00 104.00 33.12 27.06 23.84
#2 (s) 117.00 40.00 123.00 106.00 104.00 33.11 24.46 26.59
#3 (s) 118.00 52.00 106.00 107.00 113.00 34.44 24.85 22.28
#4 (s) 116.00 42.00 112.00 115.00 113.00 32.03 25.77 23.79
#5 (s) 118.00 42.00 113.00 104.00 115.00   29.36 22.46
Average (s) *117.80* 45.40 116.40 107.60 109.80 32.41 25.54 23.79
Deviation 1.76 25.44 63.44 14.64 22.96 2.91 1.00 2.38
Gain from baseline (s) 72.40 1.40 10.20 8.00 85.39 92.26 94.01
% gain 61.46% 1.19% 8.66% 6.79% 72.48% 78.32% 79.80%

Raw Maven summary

At this point, we have seen several ways to speed up your Maven build:

  • Maven daemon: A solid, safe starting point
  • Parallelize builds: When the build contains multiple modules that are independent of each other
  • Parallelize tests: When the project contains multiple tests
  • Offline: When the project contains SNAPSHOT dependencies and you don’t need to update them
  • JVM parameters: When you want to go the extra mile

I’d advise every user to start using the Maven daemon and continue optimizing if necessary and depending on your project.

Now, I’d like to widen the scope and do the same for Maven builds inside Docker.

Between each run, we change the source code by adding a single blank line; between each section, we remove all built images, including the intermediate ones that are the results of the multi-stage build. The idea is to avoid reusing a previously built image.

Baseline

To compute a helpful baseline, we need a sample project. I created one just for this purpose: it’s a relatively small Kotlin project.

Here’s the relevant Dockerfile:

FROM openjdk:11-slim-buster as build                         #1

COPY .mvn .mvn                                               #2
COPY mvnw .                                                  #2
COPY pom.xml .                                               #2
COPY src src                                                 #2

RUN ./mvnw -B package                                        #3

FROM openjdk:11-jre-slim-buster                              #4

COPY --from=build target/fast-maven-builds-1.0.jar .         #5

EXPOSE 8080

ENTRYPOINT ["java", "-jar", "fast-maven-builds-1.0.jar"]     #6
  1. Start from a JDK image for the packaging step
  2. Add required resources
  3. Create the JAR
  4. Start from a JRE for image creation step
  5. Copy the JAR from the previous step
  6. Set the entry point

Let’s execute the build:

time DOCKER_BUILDKIT=0 docker build -t fast-maven:1.0 . #1
  1. Forget the environment variable for now, as I’ll explain in the next section

Here are the results of the five runs:

* 0.36s user 0.53s system 0% cpu 1:53.06 total
* 0.36s user 0.56s system 0% cpu 1:52.50 total
* 0.35s user 0.55s system 0% cpu 1:56.92 total
* 0.36s user 0.56s system 0% cpu 2:04.55 total
* 0.38s user 0.61s system 0% cpu 2:04.68 total

Buildkit for the win

The last command line used the DOCKER_BUILDKIT environment variable. It’s the way to tell Docker to use the legacy engine. If you didn’t update Docker for some time, it’s the engine that you’re using. Nowadays, BuildKit has superseded it and is the new default.

BuildKit brings several performance improvements:

  • Automatic garbage collection
  • Concurrent dependency resolution
  • Efficient instruction caching
  • Build cache import/export
  • etc.

Let’s re-execute the previous command on the new engine:

time docker build -t fast-maven:1.1 .

Here’s an excerpt of the console log of the first run:

...
 => => transferring context: 4.35kB
 => [build 2/6] COPY .mvn .mvn
 => [build 3/6] COPY mvnw .
 => [build 4/6] COPY pom.xml .
 => [build 5/6] COPY src src
 => [build 6/6] RUN ./mvnw -B package
...

0.68s user 1.04s system 1% cpu 2:06.33 total

The following executions of the same command have a slightly different output:

...
 => => transferring context: 1.82kB
 => CACHED [build 2/6] COPY .mvn .mvn
 => CACHED [build 3/6] COPY mvnw .
 => CACHED [build 4/6] COPY pom.xml .
 => [build 5/6] COPY src src
 => [build 6/6] RUN ./mvnw -B package
...

Remember that we change the source code between runs. Files that we do not change, namely .mvn, mvnw and pom.xml, are cached by BuildKit. But these resources are small, so that caching doesn’t significantly improve the build time.

* 0.69s user 1.01s system 1% cpu 2:05.08 total
* 0.65s user 0.95s system 1% cpu 1:58.51 total
* 0.68s user 0.99s system 1% cpu 1:59.31 total
* 0.64s user 0.95s system 1% cpu 1:59.82 total

A fast glance at the logs reveals that the biggest bottleneck in the build is the download of all dependencies (including plugins). It occurs every time we change the source code. That’s the reason why BuildKit doesn’t improve the performance.

Layers, layers, layers

We should focus our efforts on the dependencies. For that, we can leverage layers and split the build into two steps:

  • In the first step, we download dependencies
  • In the second one, we do the proper packaging

Each step creates a layer, the second depending on the first.

With layering, if we change the source code in the second layer, the first layer is not impacted and can be reused. We don’t need to download dependencies again. The new Dockerfile looks like:

FROM openjdk:11-slim-buster as build

COPY .mvn .mvn
COPY mvnw .
COPY pom.xml .

RUN ./mvnw -B dependency:go-offline                          #1

COPY src src

RUN ./mvnw -B package                                        #2

FROM openjdk:11-jre-slim-buster

COPY --from=build target/fast-maven-builds-1.2.jar .

EXPOSE 8080

ENTRYPOINT ["java", "-jar", "fast-maven-builds-1.2.jar"]
  1. The go-offline goal downloads all dependencies and plugins
  2. At this point, all dependencies are available

Note that go-offline doesn’t download everything. The command won’t run successfully if you try to use the -o option (for offline). It’s a well-known old bug. In all cases, it’s “good enough”.

Let’s run the build:

time docker build -t fast-maven:1.2 .

The first run takes significantly more time than the baseline:

0.84s user 1.21s system 1% cpu 2:35.47 total

However, the subsequent builds are much faster. Changing the source code only affects the second layer and doesn’t trigger the download of (most) dependencies:

* 0.23s user 0.36s system 5% cpu 9.913 total
* 0.21s user 0.33s system 5% cpu 9.923 total
* 0.22s user 0.38s system 6% cpu 9.990 total
* 0.21s user 0.34s system 5% cpu 9.814 total
* 0.22s user 0.37s system 5% cpu 10.454 total

Volume mount in build

Layering the build improved the build time drastically. We can change the source code and keep it low. There’s one remaining issue, though. Changing a single dependency invalidates the layer, so we need to download all of them again.

Fortunately, BuildKit introduces volumes mount during the build (and not only during the run). Several types of mounts are available, but the one that interests us is the cache mount. It’s an experimental feature, so you need to explicitly opt-in:

# syntax=docker/dockerfile:experimental                      #1
FROM openjdk:11-slim-buster as build

COPY .mvn .mvn
COPY mvnw .
COPY pom.xml .
COPY src src

RUN --mount=type=cache,target=/root/.m2,rw ./mvnw -B package #2

FROM openjdk:11-jre-slim-buster

COPY --from=build target/fast-maven-builds-1.3.jar .

EXPOSE 8080

ENTRYPOINT ["java", "-jar", "fast-maven-builds-1.3.jar"]
  1. Opt-in to experimental features
  2. Build using the cache

It’s time to run the build:

time docker build -t fast-maven:1.3 .

The build time is higher than for the regular build but still lower than the layers build:

0.71s user 1.01s system 1% cpu 1:50.50 total

The following builds are on par with layers:

* 0.22s user 0.33s system 5% cpu 9.677 total
* 0.30s user 0.36s system 6% cpu 10.603 total
* 0.24s user 0.37s system 5% cpu 10.461 total
* 0.24s user 0.39s system 6% cpu 10.178 total
* 0.24s user 0.35s system 5% cpu 10.283 total

However, as opposed to layers, we only need to download updated dependencies. Here, let’s change Kotlin’s version from 1.5.30 to 1.5.31:

<properties>
    <kotlin.version>1.5.31</kotlin.version>
</properties>

It’s a huge improvement regarding the build time:

* 0.41s user 0.57s system 2% cpu 44.710 total

Considering the Maven daemon

In the previous post regarding regular Maven builds, I mentioned the Maven daemon. Let’s change our build accordingly:

FROM openjdk:11-slim-buster as build

ADD https://github.com/mvndaemon/mvnd/releases/download/0.6.0/mvnd-0.6.0-linux-amd64.zip . #1

RUN apt-get update \                                         #2
 && apt-get install unzip \                                  #3
 && mkdir /opt/mvnd \                                        #4
 && unzip mvnd-0.6.0-linux-amd64.zip \                       #5
 && mv mvnd-0.6.0-linux-amd64/* /opt/mvnd                    #6

COPY .mvn .mvn
COPY mvnw .
COPY pom.xml .
COPY src src

RUN /opt/mvnd/bin/mvnd -B package                            #7

FROM openjdk:11-jre-slim-buster

COPY --from=build target/fast-maven-builds-1.4.jar .

EXPOSE 8080

ENTRYPOINT ["java", "-jar", "fast-maven-builds-1.4.jar"]
  1. Download the latest version of the Maven daemon
  2. Refresh the package index
  3. Install unzip
  4. Create a dedicated folder
  5. Extract the archive that we downloaded in step #1
  6. Move the content of the extracted archive to the previously created folder
  7. Use mvnd instead of the Maven wrapper

Let’s run the build now:

docker build -t fast-maven:1.4 .

The log outputs the following:

* 0.70s user 1.01s system 1% cpu 1:51.96 total
* 0.72s user 0.98s system 1% cpu 1:47.93 total
* 0.66s user 0.93s system 1% cpu 1:46.07 total
* 0.76s user 1.04s system 1% cpu 1:50.35 total
* 0.80s user 1.18s system 1% cpu 2:01.45 total

There’s no significant improvement compared to the baseline.

I tried to create a dedicated mvnd image and use it as a parent image:

# docker build -t mvnd:0.6.0 .
FROM openjdk:11-slim-buster as build

ADD https://github.com/mvndaemon/mvnd/releases/download/0.6.0/mvnd-0.6.0-linux-amd64.zip .

RUN --mount=type=cache,target=/var/cache/apt,rw apt-get update \
 && apt-get install unzip \
 && mkdir /opt/mvnd \
 && unzip mvnd-0.6.0-linux-amd64.zip \
 && mv mvnd-0.6.0-linux-amd64/* /opt/mvnd
# docker build -t fast-maven:1.5 .
FROM mvnd:0.6.0 as build

# ...

This approach changes the output in any significant way.

mvnd is only good when the daemon is up during several runs. I found no way to do that with Docker. If you’ve any idea on how to achieve it, please tell me; extra points if you can point me to an implementation.

Here’s the summary of all execution times:

Baseline BuildKit Layers Volume mount mvnd
#1 (s) 113.06 125.08 155.47 110.5 111.96
#2 (s) 112.5 118.51 9.91 9.68 107.93
#3 (s) 116.92 119.31 9.92 10.6 106.07
#4 (s) 124.55 119.82 9.99 10.46 110.35
#5 (s) 124.68 9.81 10.18 121.45
#6 (s) 10.45 10.28
#7 (s) 44.71
Average (s) 118.34 120.68 9.91 10.24 111.55
Deviation 28.55 6.67 0.01 0.10 111.47
Gain from baseline (s) 0 -2.34 108.43 108.10 6.79
% gain 0.00% -1.98% 91.63% 91.35% 5.74%

Conclusion

Speeding up the performance of Maven builds inside of Docker is pretty different from regular builds. In Docker, the limiting factor is the download speed of dependencies If you’re stuck on an old version, you need to use layers to cache dependencies.

With BuildKit, I recommend using the new cache mount capability to avoid downloading all dependencies if the layer is invalidated.

The complete source code for this post can be found on Github in Maven format.

To go further:

Originally published at A Java Geek on October 3rd and October 10th, 2021

Author: Nicolas Fränkel

Nicolas Fränkel is a Developer Advocate with 15+ years experience consulting for many different customers, in a wide range of contexts (such as telecoms, banking, insurances, large retail and public sector). Usually working on Java/Java EE and Spring technologies, but with focused interests like Rich Internet Applications, Testing, CI/CD and DevOps. Also double as a teacher in universities and higher education schools, a trainer and triples as a book author. You can find Nicolas’ weekly post on his blog.

Next Post

Previous Post

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

© 2024 JVM Advent | Powered by steinhauer.software Logosteinhauer.software

Theme by Anders Norén