JVM Advent

The JVM Programming Advent Calendar

The best way to prevent Java Persistence and Hibernate performance issues

Introduction

Many years ago, I used to work as a team leader, and, one day, the General Manager asked me to take a look at a project that was in big trouble.

The application in question had been developed by a team of software developers for over 9 months, and the client had just tested in a production-like environment.

The client got very upset when realizing that the application was barely crawling. For instance, I was told that a query had been running for 10 hours without showing any signs of stopping.

After analyzing the project, I identified many areas that could have been improved, and that’s how my passion for high-performance data access was born.

JPA and Hibernate

It was 2004 when I first heard of Hibernate. Back then, I was working on a .NET project for my college thesis and wasn’t very happy with ADO.NET at the time. Therefore, I started reading about NHibernatem, which was still in Beta at the time. NHibernate was trying to adapt the Hibernate 2 project from Java to .NET, and even the Beta version at the time was a much better alternative to ADO.NET.

From that moment, Hibernate got really popular. In fact, the Java Persistence API, which emerged in 2006, is very much based on Hibernate.

Thanks to JPA, Hibernate’s popularity grew even larger as most Java EE or Spring projects used it either directly or indirectly. Even today, most Spring Boot projects use Hibernate too, via the Spring Data JPA module.

Logging SQL statements

When using a data access framework where all queries must be stated explicitly, it’s obvious what SQL queries will be executed by the application.

On the other hand, JPA and Hibernate execute SQL statements based on the entity state transitions operated by the data access layer code.

For this reason, it’s very important to always log the SQL statement generated by JPA and Hibernate.

The best way to log SQL statements is to use a JDBC DataSource or Driver proxy, as explained in this article.

Domain Model

Let’s consider you are mapping a post parent table and the post_comment child table. There is a one-to-many table relationship between the post and post_comment tables via the post_id Foreign Key column in the post_comment table.

You could map the post and post_comment tables as JPA entities in the following way:

@Entity(name = "Post")
@Table(name = "post")
public class Post {

    @Id
    private Long id;

    private String title;

    public Long getId() {
        return id;
    }

    public Post setId(Long id) {
        this.id = id;
        return this;
    }

    public String getTitle() {
        return title;
    }

    public Post setTitle(String title) {
        this.title = title;
        return this;
    }
}

@Entity(name = "PostComment")
@Table(name = "post_comment")
public class PostComment {

    @Id
    private Long id;

    @ManyToOne
    private Post post;

    private String review;

    public PostComment setId(Long id) {
        this.id = id;
        return this;
    }

    public Post getPost() {
        return post;
    }

    public PostComment setPost(Post post) {
        this.post = post;
        return this;
    }

    public String getReview() {
        return review;
    }

    public PostComment setReview(String review) {
        this.review = review;
        return this;
    }
}
Notice that the Post and PostComment use a fluent-style API. For more details about the advantages of using this strategy, check out this article.
Now, let’s assume we are adding three Post entities into our database, each Post containing three PostComment child entities:
doInJPA(entityManager -> {
    long pastId = 1;
    long commentId = 1;

    for (long i = 1; i <= 3; i++) {
        Post post = new Post()
            .setId(pastId++)
            .setTitle(
                String.format(
                    "High-Performance Java Persistence, part %d", 
                    i
                )
        );
        entityManager.persist(post);

        for (int j = 0; j < 3; j++) {
            entityManager.persist(
                new PostComment()
                .setId(commentId++)
                .setPost(post)
                .setReview(
                    String.format(
                        "The part %d was %s", 
                        i, 
                        reviews[j]
                    )
                )
            );
        }
    }
});

Fetching data

Let’s assume you want to load a PostComment from the database. For that, you can call the find JPA method as follows:
PostComment comment = entityManager.find(
    PostComment.class, 
    1L
);
When executing the find method, Hibernate generates the following SQL query:
SELECT 
    pc.id AS id1_1_0_,
    pc.post_id AS post_id3_1_0_,
    pc.review AS review2_1_0_,
    p.id AS id1_0_1_,
    p.title AS title2_0_1_
FROM 
    post_comment pc
LEFT OUTER JOIN 
    post p ON pc.post_id=p.id
WHERE 
    pc.id=1

Where did that LEFT OUTER JOIN come from?

Well, this is because the @ManyToOne association in PostComment uses the default fetching strategy, which is FetchType.EAGER.

So, Hibernate has to do the LEFT OUTER JOIN as the mapping says it should always initialize the post association when fetching the PostComment entity.

Now, look what happens when you execute a JPQL query to fetch the same PostComment entity:

PostComment comment = entityManager
.createQuery(
    "select pc " +
    "from PostComment pc " +
    "where pc.id = :id", PostComment.class)
.setParameter("id",1L)
.getSingleResult();
Instead of a LEFT OUTER JOIN, we have a secondary query now:
SELECT 
    pc.id AS id1_1_,
    pc.post_id AS post_id3_1_,
    pc.review AS review2_1_
FROM 
    post_comment pc
WHERE 
    pc.id = 1

SELECT 
    p.id AS id1_0_0_,
    p.title AS title2_0_0_
FROM 
    post p 
WHERE 
    p.id = 1
Now, there was a single extra query this time, but if we fetch all PostComment entities associated with a given Post title:
List comments = entityManager
.createQuery(
    "select pc " +
    "from PostComment pc " +
    "join pc.post p " +
    "where p.title like :titlePatttern", PostComment.class)
.setParameter(
    "titlePatttern", 
    "High-Performance Java Persistence%"
)
.getResultList();

assertEquals(9, comments.size());
Hibernate will issue 4 queries now:
SELECT 
    pc.id AS id1_1_,
    pc.post_id AS post_id3_1_,
    pc.review AS review2_1_
FROM 
    post_comment pc
JOIN 
    post p ON pc.post_id=p.id
WHERE 
    p.title LIKE 'High-Performance Java Persistence%'
    
SELECT 
    p.id AS id1_0_0_,
    p.title AS title2_0_0_
FROM 
    post p 
WHERE 
    p.id = 1
    
SELECT 
    p.id AS id1_0_0_,
    p.title AS title2_0_0_
FROM 
    post p 
WHERE 
    p.id = 2
    
SELECT 
    p.id AS id1_0_0_,
    p.title AS title2_0_0_
FROM 
    post p 
WHERE 
    p.id = 3

There are four SQL queries this time. The first one is for the actual JPQL query that filters the post_comment table records while the remaining three are for fetching the Post entity eagerly.

Reviewing and validating all these @ManyToOne associations and making sure they are always using FetchTYpe.LAZY will take time. More, you cannot guarantee that one day, someone else will come and change a given association from FetchTYpe.LAZY to FetchTYpe.EAGER.

Detecting performance problems automatically

A much better approach to solving this issue is to use Hypersistence Optimizer. After setting the Maven dependency:
<dependency>
    <groupId>io.hypersistence</groupId>
    <artifactId>hypersistence-optimizer</artifactId>
    <version>${hypersistence-optimizer.version}</version>
</dependency>
All you need to do is add the following code to any of your integration tests:
public void testNoPerformanceIssues() {
    ListEventHandler listEventHandler = new ListEventHandler();
    
    new HypersistenceOptimizer(
        new JpaConfig(entityManagerFactory())
            .addEventHandler(listEventHandler)
    ).init();

    assertTrue(listEventHandler.getEvents().isEmpty());
}
That’s it! Now, if you try to run the tests, the suite will fail with the following error:
ERROR [main]: Hypersistence Optimizer - CRITICAL 
- EagerFetchingEvent 
- The [post] attribute in the [io.hypersistence.optimizer.config.PostComment] entity 
  uses eager fetching. Consider using a lazy fetching which, 
  not only that is more efficient, but it is way more flexible 
  when it comes to fetching data. 

  For more info about this event, check out this User Guide link 
- https://vladmihalcea.com/hypersistence-optimizer/docs/user-guide/#EagerFetchingEvent

java.lang.AssertionError
    at org.junit.Assert.fail(Assert.java:86)
    at org.junit.Assert.assertTrue(Assert.java:41)
    at org.junit.Assert.assertTrue(Assert.java:52)
    at io.hypersistence.optimizer.config.FailFastOnPerformanceIssuesTest.testNoPerformanceIssues(FailFastOnPerformanceIssuesTest.java:41)
Awesome, right? You cannot even build the project with performance issues like this one sneaking in your data access code.

Conclusion

Using JPA and Hibernate is very convenient, but you need to pay extra attention to the underlying SQL statements that get generated on your behalf.

With a tool like Hypersistence Optimizer, you can finally spend your time focusing on your application requirements instead of chasing JPA and Hibernate performance issues.

Author: Vlad Mihalcea

I work as a Hibernate Developer Advocate, I wrote the High-Performance Java Persistence book and video course series. I speak at conferences and I created several open-source projects, like FlexyPool, hibernate-types, and db-util.

Next Post

Previous Post

4 Comments

  1. Kevin December 13, 2019

    Nice but the author’s whole idea is not to use FetchTYpe.EAGER. There are reasons why there is such a type and there are cases FetchTYpe.EAGER is necessary.

    • Vlad Mihalcea December 14, 2019 — Post Author

      > Nice but the author’s whole idea is not to use FetchTYpe.EAGER.

      Yes, that’s right.

      > There are reasons why there is such a type and there are cases FetchTYpe.EAGER is necessary.

      Nope, [there are not](https://vladmihalcea.com/eager-fetching-is-a-code-smell/). FetchType.EAGER is a common cause of performance isues, and the worst way to define a fetching strategy is during mapping. The fetching policy is a business-specific, so it should be defined at query time, not at mapping time.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

© 2024 JVM Advent | Powered by steinhauer.software Logosteinhauer.software

Theme by Anders Norén