Site icon JVM Advent

Test Your Test

Creating or modifying an application involves many aspects, such as following best practices and applying design patterns to solve everyday problems. After writing the code, developers usually add unit tests and rely on tools like Sonar to track metrics such as code coverage and highlight potentially untested areas.

However, high test coverage does not guarantee low risk in production. A test may execute a line of code without actually verifying the outcome. A test may instantiate an object but never check all of its significant attributes. Coverage shows what code ran—not whether the tests would catch a fundamental defect.

This raises an important question: how can we measure not only how much code is tested, but how practical those tests really are?

context of the situation

Imagine a team responsible for several microservices. The team is recognized as one of the best, with strong code coverage and consistent promotion of good practices, such as using Sonar to detect problems and creating integration tests to verify interactions between the application and external resources, such as databases.

One day, a new feature was deployed to production—just a slight change in a few classes. Nothing that appeared risky, considering the large number of existing tests. However, a few minutes later, a significant issue surfaced, affecting the entire platform rather than just the application involved.

Seconds after the problem appeared, someone on the team analyzed the code, detected the issue, and fixed it. During the investigation, it became clear that the tests were not reliable, as they neither failed before nor after the changes.

The following is the test that caused the problem:

@Test
public void should_return_a_country() {
when(countryRepository.findByCode("AR"))
.thenReturn(getCountryModel());

CountryDTO response = countryService.getCountryByCode("AR");
}

The test calls a method but overlooks crucial attributes, potentially causing errors in other applications.

What’s mutation testing?

Mutation testing is a technique for evaluating the effectiveness of a test suite by introducing small, controlled changes into the code and verifying whether the tests detect them. Instead of measuring only which lines of code are executed, mutation testing focuses on the quality of assertions and the ability of tests to catch meaningful defects.

CORE CONCEPTS

The heart of this technique implies a set of concepts:

After the creation of the mutants and executing all the tests, each mutation could stay in two states:

mUTATION SCORE

Mutation testing provides a percentage indicating how practical the tests are. To calculate this, it’s necessary to use the following formula:

Mutation Testing Score

The way to interpret the percentage is:

It’s important to note that a high mutation test score does not mean the application is bug-free.

Types of Mutations

Mutation testing tools create various modifications, known as mutations, that simulate potential defects in the code. While the specific mutation operators can differ based on the programming language or library used, they generally fall into three main categories: Decision mutationsStatement mutations, and Value mutations. Each category focuses on different aspects of the program’s behavior, helping assess how effectively the test suite validates the code’s logic, control flow, and data integrity.

Let’s see a brief explanation about each of them:

How to IMPLEMENT IT ON AN APPLICATION?

To implement mutation testing in a JVM ecosystem, several libraries are available, such as Pitest, Major, and MuJava. The first option is the best because it is actively maintained, highly performant, integrates seamlessly with Maven, Gradle, JUnit, and TestNG, supports incremental analysis with extensive configuration options, generates clear HTML reports of killed and surviving mutants, and even provides plugins for SonarQube and other tools.

This article uses a source from a GitHub repository; feel free to clone it and use it to learn about mutation testing.

To use this library, you first need to add the dependency to your application. The following block represents how to do it on a Maven project:

<!-- Mutation Test -->
<plugin>
<groupId>org.pitest</groupId>
<artifactId>pitest-maven</artifactId>
<version>${pitest-maven.version}</version>

<configuration>
<outputFormats>
<outputFormat>HTML</outputFormat>
<outputFormat>XML</outputFormat>
</outputFormats>
<targetClasses>
<param>com.twa.flights.api.catalog.*</param>
</targetClasses>
<targetTests>
<param>com.twa.flights.api.catalog.*</param>
</targetTests>
</configuration>
<dependencies>
<dependency>
<groupId>org.pitest</groupId>
<artifactId>pitest-junit5-plugin</artifactId>
<version>${pitest-junit5-plugin.version}</version>
</dependency>
</dependencies>
</plugin>

As a recommendation, check the latest version of this library on the official webpage or a repository like this regularly.

Pitest allows users to export execution results in multiple formats, including HTML, CSV, and XML. The relevance of each format depends on the report’s purpose. For example, the HTML format is ideal for those who want a simple view of execution results, including the number of mutations used. In contrast, the XML format helps integrate this information with other tools, such as Sonar, which can display mutation execution results.

On this tool, it’s possible to indicate in the same way that appears on the previous code block, which packages or test classes will be mutated, and it’s possible to indicate the same about which package of the source code will suffer modifications.

Executing mutation testing implies just running a command like the following:

$ mvn clean package org.pitest:pitest-maven:mutationCoverage
[INFO] --- pitest:1.7.6:mutationCoverage (default-cli) @ api-catalog ---
[INFO] Root dir is : /home/asacco/Code/testing-your-test/api-catalog
[INFO] Found plugin : Default csv report plugin
[INFO] Found plugin : Default xml report plugin
[INFO] Found plugin : Default html report plugin
......
[INFO] Found shared classpath plugin : Default mutation engine
[INFO] Found shared classpath plugin : JUnit 5 test framework support
[INFO] Found shared classpath plugin : JUnit plugin
[INFO] Available mutators : EXPERIMENTAL_ARGUMENT_PROPAGATION,FALSE_RETURNS,TRUE_RETURNS,CONDITIONALS_BOUNDARY,CONSTRUCTOR_CALLS,EMPTY_RETURNS,INCREMENTS,INLINE_CONSTS,INVERT_NEGS,MATH,NEGATE_CONDITIONALS,NON_VOID_METHOD_CALLS,NULL_RETURNS,PRIMITIVE_RETURNS,REMOVE_CONDITIONALS_EQUAL_IF,REMOVE_CONDITIONALS_EQUAL_ELSE,REMOVE_CONDITIONALS_ORDER_IF,REMOVE_CONDITIONALS_ORDER_ELSE,RETURN_VALS,VOID_METHOD_CALLS,EXPERIMENTAL_BIG_DECIMAL,EXPERIMENTAL_BIG_INTEGER,EXPERIMENTAL_MEMBER_VARIABLE,EXPERIMENTAL_NAKED_RECEIVER,REMOVE_INCREMENTS,EXPERIMENTAL_RETURN_VALUES_MUTATOR,EXPERIMENTAL_SWITCH,EXPERIMENTAL_BIG_DECIMAL,EXPERIMENTAL_BIG_INTEGER
......
......
================================================================================
- Timings
================================================================================
> pre-scan for mutations : < 1 second
> scan classpath : < 1 second
> coverage and dependency analysis : < 1 second
> build mutation tests : < 1 second
> run mutation analysis : 4 seconds
--------------------------------------------------------------------------------
> Total : 5 seconds
--------------------------------------------------------------------------------
================================================================================
- Statistics
================================================================================
>> Line Coverage: 63/195 (32%)
>> Generated 64 mutations Killed 7 (11%)
>> Mutations with no coverage 54. Test strength 70%
>> Ran 15 tests (0.23 tests per mutation)
Enhanced functionality available at https://www.arcmutate.com/
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 9.757 s
[INFO] Finished at: 2025-11-27T10:07:37-03:00
[INFO] ------------------------------------------------------------------------

Execution time may vary depending on the size of the source code and the available resources on the machine where these tests are running.

To see the HTML report graphically, open the target folder and look for the pit-reports folder. The report will look like the following image:

Mutation Testing: General Overview

To reduce execution time, you can use historical execution data to detect changes in code and tests. In concrete terms, the command implies adding only one parameter, as shown in the following block.

$ mvn clean package org.pitest:pitest-maven:mutationCoverage -DwithHistory
......
.....
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 5.689 s
[INFO] Finished at: 2025-11-27T10:21:54-03:00
[INFO] ------------------------------------------------------------------------

The execution time passes from 9,7 to 5,6 seconds in a small project with a few classes. The approach is beneficial when the applications have a lot of code and tests.

A critical aspect of mutation testing is the ability to use multiple mutation engines. An engine is responsible for modifying the source code; in some cases, changing all the logic inside a method or a class, rather than just adjusting the method’s parameters or its response. By default, Pitest uses Gregor, which introduces modifications to the different sentences of a technique, but it’s possible to use Descartes, which reduces the modifications. To use it, it’s necessary to introduce some changes, like the following:

<!-- Mutation Test -->
<plugin>
<groupId>org.pitest</groupId>
<artifactId>pitest-maven</artifactId>
<version>${pitest-maven.version}</version>

<configuration>
<outputFormats>
<outputFormat>HTML</outputFormat>
<outputFormat>XML</outputFormat>
</outputFormats>
<targetClasses>
<param>com.twa.flights.api.catalog.*</param>
</targetClasses>
<targetTests>
<param>com.twa.flights.api.catalog.*</param>
</targetTests>
<mutationEngine>descartes</mutationEngine>
</configuration>
<dependencies>
<dependency>
<groupId>org.pitest</groupId>
<artifactId>pitest-junit5-plugin</artifactId>
<version>${pitest-junit5-plugin.version}</version>
</dependency>
<dependency>
<groupId>eu.stamp-project</groupId>
<artifactId>descartes</artifactId>
<version>1.3.2</version>
</dependency>
</dependencies>
</plugin>

As a recommendation, check the latest version of this library, as new versions are released at regular intervals.

What are THE Challenges and costs?

Introducing mutation testing into an existing application is not free of challenges, as it requires understanding its limitations and trade-offs. With this in mind, it’s crucial to set realistic expectations for this type of testing. Some of the most relevant issues are:

None of these issues invalidates the benefits of mutation testing, but it’s essential to develop a plan to mitigate or reduce their impact.

WHICH STRATEGIES EXIST for Adopting IT?

Adopting a new technique or tool involves several considerations, especially in an existing application with many classes and tests. There is no magic formula for implementing mutation testing without pain, but there are different approaches to reduce the problems to a low level. Some of the most relevant strategies are:

It is possible to use one of these strategies or combine them to achieve better results, but in all cases, the choice depends on the size of the application and the number of unit tests.

WHAT’S NEXT?

There are many resources on unit testing and mutation testing. The following is just a short list of resources:

Other resources could be great for understanding some concepts related to testing in depth:

Consider this just a small list of available resources. If something is unclear, find another video or resource.

CONCLUSION

Creating tests for an application does not guarantee that nothing will go wrong, and mutation testing is not a silver bullet that can detect every possible issue. However, it provides a valuable and objective way to evaluate how practical existing tests really are.

A practical approach is to adopt it gradually: start with a small number of packages or a limited mutation scope, measure the effect on the build and pipeline, and then expand its use as appropriate.

Used pragmatically, mutation testing can significantly improve test quality and increase confidence in the application’s behavior without overwhelming the development process.

Author: Andres Sacco

Andres Sacco has been a developer since 2007 in different languages, including Java, PHP, NodeJs, Scala, and Kotlin. His background is mostly in Java and the libraries or frameworks associated with this language. In most of the companies he worked for, he researched new technologies to improve the performance, stability, and quality of the applications of each company. In 2017 he started to find new ways to optimize the transference of data between applications to reduce the cost of infrastructure. He suggested some actions, some of them applicable in all the manual microservices and others in just a few. All this work concludes with the creation of a series of theoric-practical projects, which are available on the page Manning.com Recently he published a book on Apress about the last version of Scala. Also, he published a set of theoric-practical projects about uncommon ways of testing, like architecture tests and chaos engineering. He dictated internal courses to different audiences like developers, business analysts, and commercial people. Also, he participates as a Technical Reviewer on the books of the editorials: Manning, Apress, and Packt.
Exit mobile version