How to Tackle the Pyramid of Quality in the Real World

Quality matters

As software engineers, we all agree, that quality is an important part of systems that we build. In the end, what is the point of the most interesting feature in the world, if in 90% of cases it doesn’t work, right?

So, we all agree that quality is important, however usually, we are not all align on the way how to get there. Some people are advocates of TDD, writing test first and code later. There are those who first write code and add test later, and also there are ones somewhere in between these two approaches. The fact that usually in universities, courses and trainings not enough time is spent on this topic doesn’t help.

There is a chance that one might have heard of Pyramid of Quality. However, people don’t always spend enough time on translating it to the real world, and how to actually do it. So let us try together to change this.

Assumption of real use case

Let us assume that we work in some company X and that we are building a simple REST API. We expose few end points. We have Service layer for more complex process. Also, we have a Repository that is used to connect to some database. Standard stuff that most of us encounter in our companies. The question at hand is how to implement the Pyramid of Quality in this example, which layers should be present and what tools should be used for them.

Code of REST API example and everything else shared in this blog post can be found at https://github.com/vladimir-dejanovic/test-pyramid-blog

Pyramid

The basic logic behind the Pyramid is that the first, bottom layer is the largest and need to cover the whole application. Every layer after that is more complex and specialized, so with each we will cover a bigger area, and not all areas need to be covered.

Unit Layer

Everyone will agree that the first layer is the Unit Test Layer.

Unit tests need to be written in a way that they are small, execute fast, and that they test only a small part, a unit, of the application. They shouldn’t depend on anything else, or make calls to any other systems, or need special setup to be run. The idea is for them to be run all the time during development stage, but also to be run at every code review/merge request stage also. If they run for a long time, people will not run them often enough, and in that way we would lose the benefit of them.

When it comes to Unit tests in the Java world, my recommendation is to use JUnit5. In case that you can’t use it for some reason, JUnit4 with some extra libraries will also do the trick.

In our use case, one simple Unit test might look something like this.

@SpringBootTest
class PostServiceTest {

    @Mock
    PostRepository postRepository;

    @InjectMocks
    PostService postService = new PostService();

    @BeforeEach
    void setUp() {
        List list = new ArrayList<>();

        Post post = new Post();
        post.setTitle("title 1");
        list.add(post);

        Mockito.when(postRepository.findAll()).thenReturn(list);
    }

    @Test
    void getAllPosts() {
        List list = postService.getAllPosts();

        Assertions.assertEquals(1,list.size());
        Assertions.assertEquals("title 1", list.get(0).getTitle());
    }
}

As we see in this code example, we are mocking the Repository with Mockito. We are not using the original one and do any database calls.

The rule of thumb is to always mock or use duplicates for any dependency that code might have, for which we are writing the Unit Test. We do this to make sure that we don’t get false positives, cases where there is a bug or issue in dependency and our unit test fails as a result, while our code was bug free.

Additionally, by using mocks we short circuit execution, and in that way keep unit tests as small and as fast as needed.

Component Layer

The next layer is optional from my point of view, and it is the Component Layer. The idea behind components tests is to test bigger parts, components, of our system.

In practice, we can easily create component tests by using JUnit, and leverage mocking dependencies on the border of the component that we are testing. In this way, we can easily decide how big or small our component test will be. Since they are bigger, we can cover the whole system with a smaller number of tests.

Although I understand the logic and reasoning behind them, I rarely encountered them in real life. For them to provide value, we need to have some very complex systems, with complex components. Even in those use cases, there is a big question mark on the return on investment of Component Test overhead over Unit Tests and some other tests in later layers.

Functional Layer

The next layer in my mind should be the Layer of Functional Tests. Here we need to cover all interactions with our system from the user perspective, and validate that in these cases everything performs as expected. Some people might argue that this should be called system tests.

My personal preference is to use the term functional because it is easier for people, including non-techies, to understand what is being tested. Additionally, over the years, I saw people use system testing in a variety of contexts and meant numerous different things under it.

Functional tests need to meet few requirements that are different to unit and component tests. They need to be able to be run stand alone, since the idea is to execute them against “working systems”, usually against local version first, and then version of application in different environments, like dev, test, staging, maybe eve production. From this, it also doesn’t come as a surprise that there should be present a way to indicate either different base URL or something else for running them against those different environments.

Here it is also critical that different environments (dev, test, staging and prod) are as identical as possible. In this way, we can easily run the same functional test over all environments and be sure that any errors that we get by running functional tests are real bugs and not environment specific. Over the years I saw setups which were very different, and the effect was that, in reality, different test were run in different environment. And this led to lower test quality that inevitably led to lower product quality.

Over time, I noticed that BDD (Behaviour-Driven Development) is the best approach to creating functional tests. The tool I use most often for it is Cucumber.

In our use case one functional test BDD might look like this

  Scenario: load data
    Given base url 'http://localhost:8080'
    When user hit end point 'posts'
    Then I expect list of data

And that would translate to code like this

public class StepDefinitions {

    HttpClient client;
    String baseUrl;
    private HttpResponse data;

    public StepDefinitions() {
        client = HttpClient.newBuilder()
                .version(HttpClient.Version.HTTP_1_1)
                .followRedirects(HttpClient.Redirect.NORMAL)
                .connectTimeout(Duration.ofSeconds(20))
                .build();
    }

    private HttpResponse hitURL(String urlPath) throws IOException, InterruptedException {
        HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create(urlPath))
                .build();

        return client.send(request, HttpResponse.BodyHandlers.ofString());
    }

    @Given("base url {string}")
    public void base_url(String url) {
        baseUrl = url;
    }
    @When("user hit end point {string}")
    public void user_hit_end_point(String endPoint) {
        try {
            data = hitURL(baseUrl + "/" + endPoint) ;
        } catch (IOException e) {
            data = null;
            e.printStackTrace();
        } catch (InterruptedException e) {
            data = null;
            e.printStackTrace();
        }
    }
    @Then("I expect list of data")
    public void i_expect_list_of_data() {
        if(data == null)
          throw new io.cucumber.java.PendingException();
    }

}

Since in our use case we are testing the interaction with our simple REST API, we are leveraging Java HTTP Client to make REST calls and validate the results with Cucumber. In case our users are interacting with a more complex system, we might use Selenium instead of HTTPClient.

Personally, I find BDD fits very nicely in the ecosystem of Functional Tests because it is easy to read and also non-tech people can easily contribute & add to them.

End-to-End Layer

The next layer that we need in our pyramid are End-to-End tests. The logic here is to really test the whole chain. Something or someone interacts with our system, that triggers our system to interact with some other system, and so on. At one point, response starts to go back via this chain. These responses need validation, so we know everything is working as expected. Writing tests like this is more difficult. They need to run on as-close-as-possible production-like environments. It should come without saying that stable environments are crucial for success. And all shareholders need to buy into this type of test. Since they are complex, writing them will take time, and if they send false positives due to instability of environment in which they are tested people will stop running and writing them.

The good thing about them, is that they need to cover only parts of the system that interact with other systems.

Tools and libraries that we should use to write end-to-end test, are usually the same ones we use for functional tests.
End-to-end tests need to be standalone, for the same reason as functional tests. They shouldn’t be tied to any specific environment, and we should run them against multiple environments of our application. Running them in staging is a must, and running them in any previous environment is a good bonus.

Performance Layer

The next layer is very regularly overlooked: the Performance Layer. In most cases, all previous layers are testing one user one click situations, and as we all know that is not how real users interact with our systems. That’s why we need to test how our system performs in real-world scenarios. This is where load tests, also known as performance test, are helpful.

My weapon of choice for this is Gatling. It does the job perfectly, is easy to configure and very versatile. In our use case, the load test might look something like this.

public class LoadTestSimulation extends Simulation {

    ChainBuilder query = exec(
            http("get posts").get("/posts"))
                    .pause(1);

    HttpProtocolBuilder httpProtocol =
            http.baseUrl("http://localhost:8080")
                    .acceptHeader("application/json")
                    .acceptLanguageHeader("en-US,en;q=0.5")
                    .acceptEncodingHeader("gzip, deflate")
                    .userAgentHeader(
                            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0"
                    );

    ScenarioBuilder users = scenario("Users").exec(query);

    {
        setUp(
                users.injectOpen(rampUsers(10).during(10))
        ).protocols(httpProtocol);
    }
}

To run it, we just need to execute the following command:

$ mvn gatling:test

The rule of thumb is that performance tests must be run in staging before pushing our code to production. As we discussed while looking at end-to-end tests, the staging environment needs to be as close to production as possible and stable. Performance tests shouldn’t be tied to any environment.

Bonus Layer

The last layer in our pyramid is a bonus layer due to the simple reason that most people don’t have it, or don’t consider it as a test layer. In all fairness, it isn’t really related to test, it is more related to protection in case things go wrong in production.

I strongly advise everyone that all features, that are developed, are put behind feature flags.

The idea of feature flags is simple: if the flag is “On” the feature is active. When it’s deactivated, users will not see the feature. In essence, a feature flag is like a switch that we can flip and activate or deactivate a feature. So in case we have an issue in production, simply flip the feature flag and disable the faulty feature and make sure that everything is back to normal, instead of patching and pushing new code to production.

Feature flags are a powerful concept and yet, they are very frequently overlooked. As I stated before, it isn’t really a testing layer, but a powerful layer of protection that all applications should have.

Resources

Author: Vladimir Dejanovic

Founder and leader of AmsterdamJUG. JavaOne Rock Star, CodeOne Star speaker Storyteller Software Architect ,Team Lead and IT Consultant working in industry since 2006 developing high performance software in multiple programming languages and technologies from desktop to mobile and web with high load traffic. Enjoining developing software mostly in Java and JavaScript, however also wrote fair share of code in Scala, C++, C, PHP, Go, Objective-C, Python, R, Lisp and many others. Always interested in cool new stuff, Free and Open Source software. Like giving talks at conferences like JavaOne, Devoxx BE, Devoxx US, Devoxx PL, Devoxx MA, Java Day Istanbul, Java Day Minks, Voxxed Days Bristol, Voxxed Days Bucharest, Voxxed Days Belgrade, Voxxed Days Cluj-Napoca and others

Twitter LinkedIn Github

How to Tackle the Pyramid of Quality in the Real World

Quality matters

Assumption of real use case

Pyramid

Unit Layer

Component Layer

Functional Layer

End-to-End Layer

Performance Layer

Bonus Layer

Resources

Author: Vladimir Dejanovic

Like this:

Related

Leave a ReplyCancel reply

How to Tackle the Pyramid of Quality in the Real World

Quality matters

Assumption of real use case

Pyramid

Unit Layer

Component Layer

Functional Layer

End-to-End Layer

Performance Layer

Bonus Layer

Resources

Author: Vladimir Dejanovic

Share this:

Like this:

Related

Leave a ReplyCancel reply

Using Postgres as a Message Queue

WebAssembly for the Java Geek

JVM Hello World