Flaky tests 2

in the previous part, I mentioned the test that sometimes failed because payment time was in future. It’s time to explain how it happened.

The next technique that we used is called “reading logs”. I mean, application logs. I know, some of you don’t have access to AUT logs. Some of you think that you are not qualified enough to read them. But believe me, it’s a crucial skill to investigate flaky tests. Without reading AUT logs, you just cannot resolve at least 10% of flaky tests.

So, go ahead! Get the access, learn to read, and here it is:

19:35:25,145 [1e8-453] INFO request Payments.success 127.0.0.1 ccd4be41-fed9-4637-aee6-57e6c50ced85 Desktop c-private paymentId=100006 -> RenderTemplate /app/views/Payments/success.html -3254 ms

It’s a typical request log. The good news is that you don’t need to understand all of it. You just need to search anomalies in it. You probably already see the anomaly, right?

Yes. The last number in the line.

This action lasted -3 seconds. How is that possible?

We could not figure it out for quite a long time. After endless tries to analyze the code, we wrote a simple bash script that just logged the current system time every second. When the test failed again after a few days, we read the log:

Fri Jun 23 23:58:34 MSK 2017
Fri Jun 23 23:58:35 MSK 2017
Fri Jun 23 23:58:36 MSK 2017
Fri Jun 23 23:58:37 MSK 2017
Fri Jun 23 23:58:35 MSK 2017
Fri Jun 23 23:58:39 MSK 2017
Fri Jun 23 23:58:40 MSK 2017

Bingo!

You see it? The current time was: 36 s, 37 s, then 35s and 39s! The system time is just broken! There cannot be other reason than Matrix has you. 🙂

Actually, it’s simple. All operating systems have service (so-called “NTP”) for synchronizing current time with some server. Our Jenkins server had two of them: “ntp” и “systemd-timesyncd”. Apparently, they connected to different servers which had a ~3 seconds gap.

It’s clearly written in Ubuntu documentation that you should never use both of them, but who read the documentation? 🙂

example 4: fantom account

Seven little tests cuttin ‘up their tricks,
One broke his cache and then there were six.

Once we had a test that expected 5 accounts on a screen:

public class AccountsTest ... {
  $$("#loans tbody tr").shouldHave(size(5));
}

But sometimes failed because there was 6 of them:

ListSizeMismatch : expected: = 5, actual: 6

We had no idea why sometimes there is an extra account.

We started investigating if some other test probably creates another account. Yes, there was such a test. But it deleted the account at the end:

@After
public void cleanup() {
 loans.delete(“where id = 6”);
}

The next guess: probably the accounts are cached somewhere?

Yes, here is an AccountService.java. It caches accounts for 5 minutes:

@CacheFor(“5mn”)
public List<Account> accounts(...) {
 return ...
}

But probably every test should clear the cache before starting?

Yes, here it is:

public abstract class UITest {
 @Before
 public void setUp() {
   api.clearCache();
 }
}

We had no more guesses. Everything seemed to be ok. It should work.

The next technique that I used here is called “put the breakpoint” (or “add logging” – it’s effectively the same). It showed that method setUp() mentioned above was not executed. What? Is matrix joking again?

You probably can guess. The answer is simple. Let’s look at the beginning of the failing AccountsTest:

public class AccountsTest extends UITest {
 @Before
 public void setUp() {
   open(“/accounts”);
 }
}

It overrides method setUp! Of course, it was not intentional. It’s very easy to make such a mistake in IDE: when you clock “Alt+Insert” or some other magic shortcut inside the test, it automatically suggests to create “setUp” or “tearDown” method. It’s so easy to miss the fact that there already exists method with the same name somewhere in super- or subclass. Especially if everything works well most of the time.

You may ask, why this test didn’t fail always, but only sometimes?

It depended on the order and speed of tests. As you remember, the accounts were cached for 5 minutes? Most of the times, between these 2 tests elapsed more than 5 minutes and accounts cached had expired. And test failed only when it took less than 5 minutes between those 2 tests.

The moral here is that clean code and automated code review is also important for tests, not only for production code. After that case, I always enable this checkbox (along with many others) in IDE:

So, let’s add to the TOP of causes:

application cache
data from the previous tests
time

Theory

top causes of flaky tests

Let me summarize. Here are most common causes of flaky tests:

Ajax requests: speed, order, timeouts
Speed of JavaScript (browser)
Browser window size
Cache
Time
UI effects
Parallel tests

But it’s not all. Often flaky tests are caused by actual bugs, but these bugs can be:

hard to reproduce
unrealistic
noncritical
Usability issues

All these things often have lower priority. This is what makes your life hard. You have a test that actually shows a problem, but nobody is going to fix that problem.

Prevention of flaky tests

What can you do to prevent flaky tests?

Here are my best practices:

Use testing pyramid
Use Selenide
Emulate external dependencies
Clean up the database before every test

Testing pyramid

This is the essential. Don’t underestimate The Right Proportion.

You should have a lot of unit tests and a few of UI tests (or any other integration tests). I insist! My principle is:

Only for the most important business scenarios, write UI tests.
For all other flaws, combinations, corners cases etc. write unit tests. Avoid the trap of writing integration tests for all cases!

The reasoning is simple: unit-tests are fast and stable. UI tests are slow and flaky. That’s it.

Emulate ’em all

Don’t read real email in tests.

Don’t send real SMS in tests.

Don’t call any other services that are
1) external
2) slow
3) unstable
4) not under your control

Instead, build emulators for them. It’s pretty easy nowadays with help of libraries like Jetty (embedded), GreenMail, WireMock etc.

Control the database

I often hear how people (in their tests) try to guess what data comes from the database. “How to check if advertisement popup came, and close it?” What’s wrong with you, automation engineers? You should never guess, you should exactly know. You prepare the data before your test. Don’t fall into trap of testing on a real data. It’s a bad idea. It’s exactly the thing that makes your tests much more complex, slow and unstable. Don’t mix functional and performance tests.

Tools

When the next flaky test fails, you should be ready. Be sure that you save automatically at least the following things:

Logs of previous builds
Screenshots
Video

Logs of previous builds

In all projects I came to, they saved only logs of the last build. It ruins debugging if flaky tests. If the last build is green, you can never know what exactly happened to the previous red build. You need to save logs for all builds.

In case of Jenkins pipeline, you just need to add the following block to the Jenkinsfile:

finally {
  stage("Reports") {
    junit 'build/test-results/**/*.xml'
    artifacts = 'build/reports/**/*,logs/**/*'
    archiveArtifacts artifacts: artifacts
  }
}

Screenshots

For screenshots, just use Selenide. That’s it. Selenide makes screenshots automatically on test failures.

Video

Sometimes a screencast is the only way to understand what happened with a test. I know at least 2 easy ways how to save video during a test.

The first one is a library “video recorder“. It allows to save the video with just one annotation @Video:

@Test @Video
public void flakyTest() {
 . . .
}

The second one is a library “TestContainers“. It runs a browser in a docker and saves a video. Again, one simple @Rule to rule them all:

@Rule
public BrowserWebDriverContainer chrome = 
   new BrowserWebDriverContainer()
       .withRecordingMode(RECORD_ALL, new File("build/reports"))
       .withDesiredCapabilities(chrome());

Conclusion

Be prepared. Setup your tools. And wait.

And when the next time a flaky test fails, exult! You start hunting.

Andrei Solntsev

http://asolntsev.github.io/ru/

selenide.org

Author: Andrei Solntsev

Software developer at Codeborne (Estonia).

Creator of selenide.org

Twitter Facebook Instagram LinkedIn Github Stack Overflow

2 Comments

Andrey December 19, 2017

What is “AUT logs” ?

- Andrei Solntsev December 19, 2017 — Post Author
  
  AUT means “application under test”. It’s just logs of the application.

1 Pingback

Flaky tests - JVM Advent 2017

example 4: fantom account

Theory

top causes of flaky tests

Prevention of flaky tests

Testing pyramid

Emulate ’em all

Control the database

Tools

Logs of previous builds

Screenshots

Video

Conclusion

Author: Andrei Solntsev

Like this:

Related

2 Comments

Andrey December 19, 2017

Andrei Solntsev December 19, 2017 — Post Author

1 Pingback

Leave a ReplyCancel reply

Flaky tests 2

example 4: fantom account

Theory

top causes of flaky tests

Prevention of flaky tests

Testing pyramid

Emulate ’em all

Control the database

Tools

Logs of previous builds

Screenshots

Video

Conclusion

Author: Andrei Solntsev

Share this:

Like this:

Related

2 Comments

Andrey December 19, 2017

Andrei Solntsev December 19, 2017 — Post Author

1 Pingback

Leave a ReplyCancel reply

Hidden Treasures of Eclipse Collections 2024 Edition

Java: The Brave Companion in the World of Enterprise Software Solutions!

Portable Serverless Functions with Java (and Quarkus)