The importance of tuning your thread pools

Whether you know it or not, your Java web application is most likely using a thread pool to handle incoming requests. This is an implementation detail that many overlook, but sooner or later you will need to understand how the pool is used, and how to correctly tune it for your application. This article aims to explain the threaded model, what a thread pool is, and what you need to do to correctly configure them.

Single Threaded

Let us start with some basics, and progress with the evolution of the threaded model. No matter which application server or framework you use, Tomcat, Dropwizard, Jetty, they all use the same fundamental approach. Buried deep inside the web server is a socket. This socket is listening for incoming TCP connections, and accepting them. Once accepted, data can be read from the newly established TCP connection, parsed, and turned into a HTTP request. This request is then handed off to the web application, to do with what it wants.

To provide an understanding of the role of threads, we won’t use an application server, instead we will build a simple server from scratch. This server mirrors what most application servers do under the hood. To start with, a single threaded web server may look like this:

ServerSocket listener = new ServerSocket(8080);
try {
 while (true) {
   Socket socket = listener.accept();
   try {
     handleRequest(socket);
   } catch (IOException e) {
     e.printStackTrace();
   }
 }
} finally {
 listener.close();
}

This code creates a ServerSocket on port 8080, then in a tight loop the ServerSocket checks for new connections to accept. Once accepted the socket is passed to a handleRequest method. That method would typically read the HTTP request, do whatever process is needed, and write a response. In this simple example, handleRequest reads a single line, and returns a short HTTP response. It would be normal for handleRequest to do something more complex, such as reading from a database, or conducting some other kind of IO.

final static String response =
   “HTTP/1.0 200 OK\r\n” +
   “Content-type: text/plain\r\n” +
   “\r\n” +
   “Hello World\r\n”;

public static void handleRequest(Socket socket) throws IOException {
 // Read the input stream, and return “200 OK”
 try {
   BufferedReader in = new BufferedReader(
     new InputStreamReader(socket.getInputStream()));
   log.info(in.readLine());

   OutputStream out = socket.getOutputStream();
   out.write(response.getBytes(StandardCharsets.UTF_8));
 } finally {
   socket.close();
 }
}

As there is only a single thread handling all accepted sockets, each request must be fully handled, before accepting the next. In a real application it could be normal for the equivalent handleRequest method to take on the order of 100 milliseconds to return. If this was the case, the server would be limited to handling only 10 requests per second, one after the other.

Multi-threaded

Even though handleRequest may be blocked on IO, the CPU is free to handle more requests. With a single threaded approach this is not possible. Thus this server can be improved to allow concurrent operations, via creating multiple threads:

public static class HandleRequestRunnable implements Runnable {

 final Socket socket;

 public HandleRequestRunnable(Socket socket) {
   this.socket = socket;
 }

 public void run() {
   try {
     handleRequest(socket);
   } catch (IOException e) {
     e.printStackTrace();
   }
 }
}

ServerSocket listener = new ServerSocket(8080);
try {
 while (true) {
   Socket socket = listener.accept();
   new Thread(new HandleRequestRunnable(socket)).start();
 }
} finally {
 listener.close();
}

Here, accept() is still called in a tight loop within a single thread, but once a TCP connection is accepted, and a socket available, a new thread is spawned. This spawned thread executes a HandleRequestRunnable, which simply calls the same handleRequest method from above.

Creating the new thread, now frees up the original accept() thread to handle more TCP connections, and allows the application to handle requests concurrently. This technique is referred to as a “thread per request”, and is the most popular approach taken. It is worth noting there are other approaches, such as the event driven asynchronous model NGINX and Node.js deploy, but they don’t use thread pools, and thus are out of scope for this article.

In the thread per request approach, creating a new thread (and later destroying it) can be expensive, as both the JVM and the OS needs to allocate resources. Additionally in the above implementation, the number of threads being created is unbounded. Being unbounded is very problematic, as it can quickly led to resource exhaustion.

Resource exhaustion

Each thread requires a certain amount of memory for the stack. On recent 64bit JVMs, the default stack size is 1024KB. If the server receives a flood of requests, or the handleRequest method becomes slow, the server may end up with huge number of concurrent threads. Thus to manage 1000 concurrent requests, the 1000 threads would consume 1GB of the JVM’s RAM just for thread’s stacks. In addition the code executing in each thread will be creating objects on the heap needed to process the request. This very quickly adds up, and can exceed the heap space assigned to the JVM, putting pressure on the garbage collector, causing thrashing and eventually leading to OutOfMemoryErrors.

Not only consuming RAM, the threads may use other finite resources, such as file handles, or database connections. Exceeding these may led to other types of errors or crashes. Thus to avoid exhausting resources it is important to avoid unbounded data structures.

Not a panacea, but the stack size issue can be somewhat mitigated by tuning the stack size with the -Xss flag. A smaller stack will reduce the per thread overhead, but potentially leads to StackOverflowErrors. Your mileage will vary, but for many applications the default 1024KB is excessive, and smaller 256KB or 512KB values might be more appropriate. The smallest value Java will allow is 160KB.

Thread pool

To avoid continuously creating new threads, and to bound the maximum number, a simple thread pool can be used. Simply put, the pool keeps track of all threads, creating new ones when needed up to an upper bound, and where possible reusing idle threads.

ServerSocket listener = new ServerSocket(8080);
ExecutorService executor = Executors.newFixedThreadPool(4);
try {
 while (true) {
   Socket socket = listener.accept();
   executor.submit( new HandleRequestRunnable(socket) );
 }
} finally {
 listener.close();
}

Now, instead of directly creating threads, this code uses an ExecutorService, which submits work (in the term of Runnables) to be executed across a pool of threads. In this example a fixed thread pool of four threads is used to handle all incoming requests. This bounds the number of “in-flight” requests, and thus places bounds on the resource usage.

In addition to newFixedThreadPool, the Executors utility class also provides a newCachedThreadPool method. This suffers from the earlier unbounded number of threads, but whenever possible makes use of previously created but now idle threads. Typically this type of pool is useful for short-lived requests that do not block on external resources.

ThreadPoolExecutors can be constructed directly, allowing for its behaviour to be customised. For example, the min and max number of threads within the pool can be defined, as well as policies for when threads are created and destroyed. An example of this is shown shortly.

Work queue

In the fixed thread pool case, the observant reader may wonder what happens if all threads are busy, and a new request comes in. Well the ThreadPoolExecutor may use a queue to hold pending requests before a thread becomes available. The Executors.newFixedThreadPool by default use an unbounded LinkedList. Again this leads to the resource exhaustion problem, albeit much slower since each queued request is smaller than a full thread, and will typically not be use as many resources. However, in our examples, each queued request is holding a socket which (depending on OS) would be consuming a file handle. This is the kind of resource that the operating system will limit, so it may not be best to hold on to it unless needed. Therefore it also makes sense to bound the size of the work queue.

public static ExecutorService newBoundedFixedThreadPool(int nThreads, int capacity) {
 return new ThreadPoolExecutor(nThreads, nThreads,
     0L, TimeUnit.MILLISECONDS,
     new LinkedBlockingQueue<Runnable>(capacity),
     new ThreadPoolExecutor.DiscardPolicy());
}

public static void boundedThreadPoolServerSocket() throws IOException {
 ServerSocket listener = new ServerSocket(8080);
 ExecutorService executor = newBoundedFixedThreadPool(4, 16);
 try {
   while (true) {
     Socket socket = listener.accept();
     executor.submit( new HandleRequestRunnable(socket) );
   }
 } finally {
   listener.close();
 }
}

Again, we create a thread pool, but instead of using the Executors.newFixedThreadPool helper method, we create the ThreadPoolExecutor ourselves, passing a bounded LinkedBlockingQueue capped to 16 elements. Alternatively an ArrayBlockingQueue could have be used, which is an implementation of a bounded buffer.

If all threads are busy, and the queue fills up, what happens next is defined by the last argument to the ThreadPoolExecutor. In this example, a DiscardPolicy is used, which simply discards any work that would overflow the queue. There are other policies, such as the AbortPolicy which throws an exception, or the CallerRunsPolicy which executes the job on the caller’s thread. This CallerRunsPolicy provides a simple way to self limit the rate jobs can be added, however, it could be harmful, blocking a thread that should stay unblocked.

A good default policy is to Discard or Abort, which both drop the work. In these cases it would be easy to return a simple error to the client, such as a HTTP 503 “Service unavailable”. Some would argue that the queue size could just be increased, and then all work would eventually be run. However, users are unwilling to wait forever, and if fundamentally the rate at which work comes in, exceeds the rate it can be executed, then the queue will grow indefinitely. Instead the queue should only be used to smooth out bursts of requests, or handle short stalls in processing. In normal operation the queue should be empty.

How many threads?

Now we understand how to create a thread pool, the hard question is how many threads should be available? We have determined that the maximum number should be bounded to not cause resource exhaustion. This includes all types of resources, memory (stack and heap), open file handles, open TCP connections, the number of connections a remote database can handle, and any other finite resource. Conversely, if the threads are CPU bound instead of IO bound, then the number of physical cores should be considered finite, and perhaps no more than one thread per core should be created.

This all depends on the work the application is doing. A user should run load tests using various pool sizes, and a realistic mix of requests. Each time increasing their thread pool size until breaking point. This makes it possible to find the upper bound, for when resources are exhausted. In some cases it may be prudent to increase the number of available resources, for example making more RAM available to the JVM, or tuning the OS to allow for more file handles. However, at some point the theoretical upper bound will be reached, and should be noted, but this is not the end of the story.

Little’s Law

littlelaw

Queuing theory, in particular, Little’s Law, can be used to help understand the properties of the thread pool. In simple terms, Little’s Law describes the relationship between three variables; L the number of requests in-flight, λ the rate at which new requests arrive, and W the average time to handle the request. For example, if there are 10 requests arriving per second, and each request takes one second to process, there is an average of 10 request in-flight at any time. In our example, this maps to using 10 threads. If the time to process a single request is doubled, then the average in-flight requests also doubles to 20, and thus requires 20 threads.

Understanding the impact that execution time has on in-flight request is very important. It is common for some backend resource (such as a database) to stall, causing requests to take longer to process, quickly exhausting a thread pool. Therefore the theoretical upper bound may not be an appropriate limit for the pool size. Instead, a limit should be placed on execution time, and used in combination with the theoretical upper bound.

For example, let’s say the maximum in-flight requests that can be handled is 1000 before the JVM exceeds its memory allocation. If we budget for each request to take no longer than 30 seconds, we should expect in the worst case to handle no more than 33 ⅓ requests per second. However, if everything is working correctly, and requests take only 500ms to handle, the application can handle 2000 requests per second, on only 1000 threads. It may also be reasonable to specify that a queue can be used to smooth out short bursts of delay.

Why the hassle?

If the thread pool has too few threads, you run the risk of under utilising the resources, and turning users away unnecessarily. However, if too many threads are allowed, resource exhaustion occurs, which can be more damaging.

Not only can local resources be exhausted but it is possible to adversely impact others. Take for example, multiple applications querying the same backend database. Databases typically have a hard limit on the number of concurrent connections. If one misbehaving unbounded application consumes all these connections, it would block the others from accessing the database. Causing a widespread outage.

Even worse, a cascading failure could occur. Imagine an environment with multiple instances of a single application, behind a common load balancer. If one of the instances begins to run out of memory due to excessive in-flight requests, the JVM will spend more time garbage collecting, and less time handling the requests. That slow down, will reduce the capacity of that one instance, and force the other instances to handle a higher fraction of incoming requests. As they now handle more requests, with their unbounded thread pools, the same problem occurs. They run out of memory, and again begin aggressively garbage collecting. This vicious cycle cascades across all instances, until there is a systemic failure.

Far too often I’ve observed that load testing is not conducted, and an arbitrarily high number of threads is allowed. In the common case the application can happily process requests at the incoming rate using a small number of threads. If however, processing the requests depends on a remote service, and that service temporarily slows down, the impact of increasing W (the average processing time) can very quickly exhaust the pool. Because the application was never load tested at the maximum number, all the resource exhaustion issues outlined before are exhibited.

How many thread pools?

In microservice, or service oriented architectures (SOA), it is normal to access multiple remote backend services. This setup is particularly susceptible to failures, and thought should be made in gracefully dealing with them. If a remote service’s performance degrades, it can cause the thread pool to quickly hit its limit, and subsequent requests are dropped. However, not all requests may require this unhealthy backend, but since the thread pool is full these requests are needlessly dropped.

The failure of each backend can be isolated by providing backend specific thread pools. In this pattern, there is still a single request worker pool, but if the request needs to call a remote service, the work is transferred to that backend’s thread pool. This leaves the main request pool unburden by a single slow backend. Then only requests needing that particular backend pool are impacted when it malfunctions.

A final benefit of multiple thread pools, is it helps avoid a form of deadlock. If every available thread becomes blocked on a result of a yet to be processed request, then a deadlock occurs, and no thread is able to move forward. When using multiple pools, and having a good understanding of the work they execute, this issue can be somewhat mitigated.

Deadlines and other best practices

A common best practice is to ensure there is a deadline on all remote calls. That is, if the remote service does not respond within a reasonable time, the request is abandoned. The same technique can be used for work within the thread pool. Specifically, if the thread is processing one request for longer than a defined deadline, it should be terminated. Making room for a new request, and placing an upper bound on W. This may seem like a waste, but if the user (which might typically be a web browser) is waiting for a response, then after 30 seconds the browser might just give up anyway, or more likely the user becomes impatient and navigates away.

Failing fast, is another approach that can be taken when creating pools for backends. If the backend has failed, the thread pool will quickly fill up with request waiting to connect to the unresponsive backend. Instead, the backend can be flagged as unhealthy, all subsequent requests could fail instantly instead of needlessly waiting. Note however, that a mechanism is needed to determine when the backend has become healthy again.

Finally, if a request will need to call multiple backends independently, it should be possible to call them in parallel, instead of sequentially. This would reduce the wait time, at the cost of increased threads.

Luckily, there is a great library, hystrix, which packages many of these best practices and exposes them in a simple and safe way.

Conclusion

Hopefully this article has improved your understanding of thread pools. By understanding the application’s needs, and using a combination of the maximum thread count, and the average response time, an appropriate thread pool can be determined. Not only will this avoid cascading failures, but help plan and provision your service.

Even though your application may not explicitly use a thread pool, they are implicitly used by your application server or higher level abstraction. Tomcat, JBoss, Undertow, Dropwizard all provides multiple tunables to their thread pools (the pool which your servlet is executed).

Like what you read, find more articles like this on bramp.net, or follow @TheBramp.

JIT Compiler, Inlining and Escape Analysis

Just-in-time (JIT)

Just-in-time (JIT) compiler is the brain of the Java Virtual Machine. Nothing in the JVM affects performance more than the JIT compiler.

For a moment let’s step back and see examples of compiled and non compiled languages.

Languages like Go, C and C++ are called compiled languages because their programs are distributed as binary (compiled) code, which is targeted to a particular CPU.

On the other hand languages like PHP and Perl, are interpreted. The same program source code can be run on any CPU as long as the machine has the interpreter. The interpreter translates each line of the program into binary code as that line is executed.

Java attempts to find a middle ground here. Java applications are compiled, but instead of being compiled into a specific binary for a specific CPU, they are compiled into a bytecode. This gives Java the platform independence of an interpreted language. But Java doesn’t stop here.

In a typical program, only a small sections of the code is executed frequently, and the performance of an application depends primarily on how fast those sections of code are executed. These critical sections are known as the hot spots of the application.
The more times JVM executes a particular code section, the more information it has about it. This allows the JVM to make smart/optimized decisions and compile small hot code into a CPU specific binary. This process is called Just in time compilation (JIT).

Now let’s run a small program and observe JIT compilation.

public class App {
  public static void main(String[] args) {
    long sumOfEvens = 0;
    for(int i = 0; i < 100000; i++) {
      if(isEven(i)) {
        sumOfEvens += i;
      }
    }
    System.out.println(sumOfEvens);
  }

  public static boolean isEven(int number) {
    return number % 2 == 0;
  }
}


#### Run
javac App.java && \
java -server \
     -XX:-TieredCompilation \
     -XX:+PrintCompilation \
              - XX:CompileThreshold=100000 App


#### Output
87    1             App::isEven (16 bytes)
2499950000

Output tells us that isEven method is compiled. I intentionally disabled TieredCompilation to get only the most frequently compiled code.

JIT compiled code will give a great performance boost to your application. Want to check it ? Write a simple benchmark code.

Inlining

Inlining is one of the most important optimizations that JIT compiler makes. Inlining replaces a method call with the body of the method to avoid the overhead of method invocation.

Let’s run the same program again and this time observe inlining.

#### Run
javac App.java && \
java -server \
     -XX:+UnlockDiagnosticVMOptions \
     -XX:+PrintInlining \
     -XX:-TieredCompilation App

#### Output
@ 12   App::isEven (16 bytes)   inline (hot)
2499950000

Inlining again will give a great performance boost to your application.

Escape Analysis

Escape analysis is a technique by which the JIT Compiler can analyze the scope of a new object’s uses and decide whether to allocate it on the Java heap or (Wrong: on the method stack) [Update] handle object members directly (scalar replacement)[/Update]. It also eliminates locks for all non-globally escaping objects

Let’s run a small program and observe garbage collection.

public class App {
  public static void main(String[] args) {
    long sumOfArea = 0;
    for(int i = 0; i < 10000000; i++) {
      Rectangle rect = new Rectangle(i+5, i+10);
      sumOfArea += rect.getArea();
    }
    System.out.println(sumOfArea);
  }

  static class Rectangle {
    private int height;
    private int width;

    public Rectangle(int height, int width) {
      this.height = height;
      this.width = width;
    }

    public int getArea() {
      return height * width;
    }
  }
}

In this example Rectangle objects are created and available only within a loop, they are characterised as NoEscape and can handle object members directly (scalar replacement) instead of allocating objects in heap. Specifically, this means that no garbage collection will happen.

Let’s run the program without EscapeAnalysis.

#### Run
javac App.java && \
java -server \
     -verbose:gc \
     -XX:-DoEscapeAnalysis App

#### Output
[GC (Allocation Failure)  65536K->472K(251392K), 0.0007449 secs]
[GC (Allocation Failure)  66008K->440K(251392K), 0.0008727 secs]
[GC (Allocation Failure)  65976K->424K(251392K), 0.0005484 secs]
16818403770368

As you can see GC kicked-in. Allocation Failure means no more space is left in young generation to allocate objects. So, it is normal cause of young GC.

This time let’s run it with EscapeAnalysis.

#### Run
javac App.java && \
java -server \
    -verbose:gc \
    -XX:+DoEscapeAnalysis App

#### Output
16818403770368

No GC happened this time. Which basically means creating short lived and narrow scoped objects is not necessarily introducing garbage.

DoEscapeAnalysis option is enabled by default. Note that only Java HotSpot Server VM supports this option.

As a consequence, we all should avoid premature optimization, focus on writing more readable/maintainable code and let JVM do it’s job.

Quick Web App Prototyping with Spring Boot & MongoDB

Back in one of my previous projects I was asked to produce a little contingency application. The schedule was tight and the scope simple. The in-house coding standard is PHP, so trying to get a classic Java EE stack in place would have been a real challenge. And, to be really honest, completely oversized. So, what then? I took the chance and gave Spring a try. I used it before, but in old versions, hidden away in the tech stack of the portal software I was plagued with at this time.

My goal was to have something the WebOps can simply put on a server with Java installed and run it. No fiddling with dozens of XML configurations and memory fine tuning. Just as easy as java -jar application.jar.
It was the perfect call for “Spring Boot”. This Spring project is all about making it easy to bring you, the developer, up to speed and take away the need of loads of configuration and boilerplate coding.

Another thing my project was crying for was a document-oriented data storage. I mean, the main purpose of the application was to offer a digital version of a real-world paper form. So why create a relational mess if we can represent the document as a document?! I used MongoDB in a couple of small projects before, so I decided to go with it.

What has this got to do with this article? Well, I will show you how quickly you can bring together all the bits and pieces needed for a web application. Spring Boot will make a lot of things fairly easy and will keep the code minimal. And at the end you will have a JAR file, which is executable and can be deployed by just dropping it onto a server. Your WebOps will love you for it.

Let’s imagine we are about to create the next big product administration web application. As it is the next big thing, it needs a big name: Productr (this is the reason I am a software engineer and not in sales or marketing…).
Productr will do amazing things and this article will show you its early stages, which are:

  • providing a simple REST interface to query all available products
  • loading these products from a MongoDB
  • providing a production-ready monitoring facility
  • displaying all products by using a JavaScript UI

All you need to start is:

  • Java 8
  • Maven
  • Your favourite IDE (IntelliJ, Eclipse, vi, edlin, a butterfly…)
  • A browser (ok, or Internet Explorer / MS Edge, but who would really want this?!)

And for the impatient, the code is also available on GitHub.

Let’s get started

Create a pom.xml with the following content:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>1.3.0.RELEASE</version>
    </parent>

    <modelVersion>4.0.0</modelVersion>
    <groupId>net.h0lg.tutorials.rapid</groupId>
    <artifactId>rapid-resting</artifactId>
    <version>1.0</version>


    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>
    </dependencies>


    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
</project>

In these few lines a lot of stuff is already happening. Most important is the defined parent project. This will bring us a lot of useful and needed dependencies like logging, the Tomcat runtime and lots more. Thanks to Spring’s modularity, everything is re-configurable via pom.xml or dependency injection. For getting everything up quickly the defaults are absolutely fine. (Convention over configuration, anybody?)

Now, create the obligatory Maven folder structure:

mkdir -p src/main/java src/main/resources src/test/java src/test/resources

And we are settled.

Start the engines

Let’s get to work. We want to offer a REST interface to get access to our huge amount of products. So let’s start with creating a REST collection available under /api/products. To do so we have to do a few things:

  1. Our “data model” holding all information about our incredible products needs to be created
  2. We need a controller offering a method which does everything necessary to answer a GET request
  3. Create the main entry point for our application

The data model is pretty simple and done quickly. Just create a package called demo.model and a class called Product in it. The Product class is very straightforward:

package demo.model;

import java.io.Serializable;

/**
 * Our very important and sophisticated data model
 */
public class Product implements Serializable {

    String productId;
    String name;
    String vendor;

    public String getProductId() {
        return productId;
    }

    public void setProductId(String productId) {
        this.productId = productId;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getVendor() {
        return vendor;
    }

    public void setVendor(String vendor) {
        this.vendor = vendor;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;

        Product product = (Product) o;

        if (getProductId() != null ? !getProductId().equals(product.getProductId()) : product.getProductId() != null)
            return false;
        if (getName() != null ? !getName().equals(product.getName()) : product.getName() != null) return false;
        return !(getVendor() != null ? !getVendor().equals(product.getVendor()) : product.getVendor() != null);

    }

    @Override
    public int hashCode() {
        int result = getProductId() != null ? getProductId().hashCode() : 0;
        result = 31 * result + (getName() != null ? getName().hashCode() : 0);
        result = 31 * result + (getVendor() != null ? getVendor().hashCode() : 0);
        return result;
    }
}

Our product has the incredible amount of 3 properties: an alphanumeric product ID, a name and a vendor (just the name, to keep things simple). It is serialisable and the getters, setters and the methods equals() & hashCode() are implemented by using my IDE’s code generation.

Alright, so creating a controller with a method to offer the GET listener it is now. Go back to your favourite IDE and create the package demo.controller and a class called ProductsController with the following content:

package demo.controller;

import demo.model.Product;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RestController;

import java.util.ArrayList;
import java.util.List;

/**
 * This controller provides the REST methods
 */
@RestController
@RequestMapping(value = "/", method = RequestMethod.GET)
public class ProductsController {

    @RequestMapping(value = "/", method = RequestMethod.GET)
    public List getProducts() {
        List products = new ArrayList();

        return products;
    }

}

This is really everything you need to provide a REST interface. Ok, at the moment, an empty list is returned, but it is that easy to define.

The last thing missing is an entry point for our application. Just create a class called Productr in the package demo and give it the following content:

package demo;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

/**
 * This is the entry point of our application
 */
@SpringBootApplication
public class ProductrApplication {

    public static void main (String... opts) {
        SpringApplication.run(ProductrApplication.class, opts);
    }

}

Spring Boot saves us a lot of keystrokes. @SpringBootApplication does a few things we would need for every web application anyway. This annotation is shorthand for the following ones:

  • @Configuration
  • @EnableAutoConfiguration
  • @ComponentScan

Now it is time to start our application for the first time. Thanks to Spring Boot’s maven plugin, which we configured in our pom.xml, starting the application is as easy as: mvn spring-boot:run. Just run this command in your project root directory. You prefer the lazy point-n-click way provided by your IDE? Alright, just instruct your favourite IDE to run ProductrApplication.

Once it is started, use a browser, a REST client (you should check out Postman, I love this tool) or a command line tool like curl. The address you are looking for is: http://localhost:8080/api/products/. So, with curl, the command looks like this:


curl http://localhost:8080/api/products/

Data please

Ok, returning an empty list isn’t that shiny, is it? So let’s bring in data.
In many projects a classic relational database is usually overkill (and painful if you have to use it AND scale out). This may be one reason for the hype around NoSQL databases. One (in my opinion good) example is MongoDB.

Getting MongoDB up and running is pretty easy. On Linux you can use your package manager to install it. For Debian / Ubuntu, for example, simply do: sudo apt-get install mongodb.

For Mac, the easiest way is homebrew: brew install mongodb and follow the instructions in the “Caveats” section.

Windows users should go with the MongoDB installer (and toi toi toi).

Alright, we just got our data store sorted. It is about time to use it.
There is one particular Spring project dealing with data – called Spring Data. And by sheer coincidence a sub-project called Spring Data MongoDB is just waiting for us. Even more, Spring Boot provides a dependency package to get up to speed instantly. No wonder that the following few lines in the pom.xml‘s <dependencies> section are enough to bring in everything we need:


  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-mongodb</artifactId>
  </dependency>

Now, create a new package called demo.domain and put in a new interface called ProductRepository. Spring provides a pretty neat way to get rid of writing code which is usually needed to interact with a data source. Most of the basic queries are generated by Spring Data – all you need is to define an interface. A couple of query methods are available without even specifying method headers. One example is the findAll() method, which will return all entries in the collection.
But hey, let’s see it in action instead of talking about it. The bespoke ProductRepository interface should look like this:

package demo.domain;

import demo.model.Product;
import org.springframework.data.mongodb.repository.MongoRepository;

/**
 * This interface lets Spring generate a whole Repository implementation for
 * Products.
 */
public interface ProductRepository extends MongoRepository {

}

Next, create a class called ProductService in the same package. Purpose of this class is to actually provide some useful methods to query products. For now, the code is as easy as this:

package demo.domain;

import demo.model.Product;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.util.List;

/**
 * This is a little service class we will let Spring inject later.
 */
@Service
public class ProductService {

    @Autowired
    private ProductRepository repository;

    public List getProducts() {
        return repository.findAll();
    }

}

See how we can use repository.findAll() without even defining it in the interface? Pretty slick, isn’t it? Especially if you are in a hurry and need to get things up quickly.

Alright, so far we prepared the foundation for the data access. I think it is time to wire it together. To do so, simply head back to our class demo.controller.ProductsController and modify it slightly. All we have to do is to inject our shiny new ProductService service and call its getProducts() method. The class will look like this afterwards:

package demo.controller;

import demo.domain.ProductService;
import demo.model.Product;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RestController;

import java.util.ArrayList;
import java.util.List;

/**
 * This controller provides the REST methods
 */
@RestController
@RequestMapping("/api/products/")
public class ProductsController {

    // Let Spring DI inject the service for us
    @Autowired
    private ProductService productService;

    @RequestMapping(value = "/", method = RequestMethod.GET)
    public List getProducts() {
        // Ask the data store for a list of products
        return productService.getProducts();
    }

}

That’s it. Start MongoDB (if not already running), start our application again (remember the mvn spring-boot:run thingy?!) and start another GET request to http://localhost:8080/api/products/:


$ curl http://localhost:8080/api/products/
[]

Wait, still an empty list? Yes, or do you remember us putting anything into the database? Let’s change this by using the following command:


mongo localhost/test --eval "db.product.insert({productId: 'a1234', name: 'Our First Product', vendor: 'ACME'})"

This adds one product called “Our First Product” to our database. Ok, so what is our service returning now? This:

$ curl http://localhost:8080/api/products/
[{"productId":"5657654426ed9d921affc3c0","name":"Our First Product","vendor":"ACME"}]

Easy, wasn’t it?!

Looking for a little more data but no time to create it yourself? Alright, it’s nearly Christmas, so take my little test selection:

curl https://gist.githubusercontent.com/daincredibleholg/f8667a26ce2f17776903/raw/ed9b4c8ec6c9c455dc063e833af2418648928ba6/quick-web-app-product-example.json | mongoimport -d test -c product --jsonArray

Basic requirements at your fingertips

In today’s hectic days and with “microservice” culture spreading, it is getting harder and harder to keep an eye on what is really running on your servers or cloud environments. So in nearly all environments I was working on over the last years monitoring was a big thing. One common pattern is to provide health check endpoints. One can find everything from simple ping endpoints to health metrics, returning a detailed overview of business relevant metrics.
All of this is most of the times a copy-n-paste adventure and involves tackling a lot of boilerplate code. Here is what we have to do – simply add the following dependency to your pom.xml:


  <dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
  </dependency>

and restart the service. Let’s have a look what happens if we query http://localhost:8080/health:


$ curl http://localhost:8080/health
{"status":"UP","diskSpace":{"status":"UP","total":499088621568,"free":83261571072,"threshold":10485760},"mongo":{"status":"UP","version":"3.0.7"}}

This should provide sufficient data for a basic health check. If you follow the startup log messages, you’ll probably spotted a number of other endpoints. Experiment a bit and check the Actuator documentation for more information.

Show it to me

Ok, we got ourselves a REST service and some data. But we want to show this data to our users. So let’s go on and provide a page with an overview of our awesome products.

Thank Santa that there is a really active web UI community working on loads of nice and easy usable frontend frameworks and libraries. One pretty popular example is Bootstrap. It is easy to use and all the needed bits and pieces are provided via open CDNs.

We want to have a short overview of our products, so a table view would be nice. Bootstrap Table will help us with that. It is built on top of Bootstrap and also available via CDNs. What a world we live in…

But wait, where to put our HTML file? Spring Boot makes it easy, again. Just create a folder called src/main/resources/static and create a new HTML file called index.html with the following content:

<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">

    <title>Productr</title>

    <!-- Import Bootstrap CSS from CDNs -->
    <link rel="stylesheet" href="//maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css">
    <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/bootstrap-table/1.9.1/bootstrap-table.min.css">
</head>
<body>
<nav class="navbar navbar-inverse">
    <div class="container">
        <div class="navbar-header">
            <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
                <span class="sr-only">Toggle navigation</span>
                <span class="icon-bar"></span>
                <span class="icon-bar"></span>
                <span class="icon-bar"></span>
            </button>
            <a class="navbar-brand" href="#">Productr</a>
        </div>
        <div id="navbar" class="collapse navbar-collapse">
            <ul class="nav navbar-nav">
                <li class="active"><a href="#">Home</a></li>
                <li><a href="#about">About</a></li>
                <li><a href="#contact">Contact</a></li>
            </ul>
        </div><!--/.nav-collapse -->
    </div>
</nav>
    <div class="container">
        <table data-toggle="table" data-url="/api/products/">
            <thead>
            <tr>
                <th data-field="productId">Product Reference</th>
                <th data-field="name">Name</th>
                <th data-field="vendor">Vendor</th>
            </tr>
            </thead>
        </table>
    </div>


<!-- Import Bootstrap, Bootstrap Table and JQuery JS from CDNs -->
    <script src="//ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
    <script src="//maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"></script>
    <script src="//cdnjs.cloudflare.com/ajax/libs/bootstrap-table/1.9.1/bootstrap-table.min.js"></script>
</body>
</html>

This file isn’t pretty complex. It is just a HTML file, which includes the minimised CSS files from the CDNs. If you see a reference like //maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css for the first time, it is not a bad mistake that the protocol (http or https) is missing. A resource referenced that way will be loaded via the same protocol the main page got loaded with. Say, if you use http://localhost:8080/, it will use http: to load the CSS files.

The <body> block contains a navigation bar (using the HTML5 <nav> tag) and a table. The interesting part of this table definition is the provided data-url attribute. It is interpreted by Bootstrap Table to load the data. Our definition points to our previously created REST endpoint.
Which part of our JSON objects is used in which column is defined via the data-field attributes on the <th> definitions. Can you spot the matching attribute names?

Last but not least we load the needed JavaScript libraries. All Bootstrap-related JavaScript functionality needs JQuery, so this is the first library to load. Followed straight by the main Bootstrap and the Bootstrap Table JavaScript files. Each of these library files is loaded in the minimised version, to keep download times at a minimum.

Where to go now

It is fair to say that we have a really simple web application now. Well, the main purpose of this article was to show you how to get up to speed with as little code as possible. You’ve seen that sometimes just a dependency in your POM file brings you a complete new feature, without the need of any additional line of code.
Take a step back, look at what we’ve built so far and think about the next steps needed. And just start to take a look around in the Spring universe.

I think one of the most crucial steps needed next, beside adding the missing tests, is to bring in security. Check out Spring Security and its subprojects Spring Security OAuth.
More interested in “classic” web pages? Check out Spring MVC and how easy it is to integrate quite sophisticated template engines (e. g. by following this guide).

Hopefully, you enjoyed this article as much as I enjoyed its creation. I wish you all a merry Christmas and if the one or the other wants to get in touch, you can find me e. g. on Twitter, G+ and LinkedIn.

Functional vs Imperative Programming. Fibonacci, Prime and Factorial in Java 8

There are multiple programming styles/paradigms, but two well-known ones are Imperative and Functional.

Imperative programming is the most dominant paradigm as nearly all mainstream languages (C++, Java, C#) have been promoting it. But in the last few years functional programming started to gain attention. One of the main driving factors is that simply all new computers are shipped with 4, 8, 16 or more cores and it’s very difficult to write a parallel program in imperative style to utilize all cores. Functional style moves this difficultness to the runtime level and frees developers from hard and error-prone work.

Wait! So what’s the difference between these two styles.

Imperative programming is a paradigm where you tell how exactly and which exact statements machine/runtime should execute to achieve desired result.

Functional programming is a form of declarative programming paradigm where you tell what you would like to achieve and machine/runtime determines the best way how to do it.

Functional style moves the how part to the runtime level and helps developers focus on the what part. By abstracting the how part we can write more maintainable and scalable software.

To handle the challenges introduced by multicore machines and to remain attractive for developers Java 8 introduced functional paradigm next to imperative one.

Enough theory, let’s implement few programming challenges in Imperative and Functional style using Java and see the difference.

Fibonacci Sequence Imperative vs Functional (The Fibonacci Sequence is the series of numbers: 1, 1, 2, 3, 5, 8, 13, 21, 34, … The next number is found by adding up the two numbers before it.)

Fibonacci Sequence in iterative and imperative style

public static int fibonacci(int number) {
  int fib1 = 1;
  int fib2 = 1;
  int fibonacci = fib1;
  for (int i = 2; i < number; i++) {
    fibonacci = fib1 + fib2;
    fib1 = fib2;
    fib2 = fibonacci;
  }
  return fibonacci;
}

for(int i = 1; i  <= 10; i++) {
  System.out.print(fibonacci(i) +" ");
}
// Output: 1 1 2 3 5 8 13 21 34 55 

As you can see here we are focusing a lot on how (iteration, state) rather that what we want to achieve.

Fibonacci Sequence in iterative and functional style

IntStream fibonacciStream = Stream.iterate(
    new int[]{1, 1},
    fib -> new int[] {fib[1], fib[0] + fib[1]}
  ).mapToInt(fib -> fib[0]);

fibonacciStream.limit(10).forEach(fib ->  
    System.out.print(fib + " "));
// Output: 1 1 2 3 5 8 13 21 34 55 

In contrast, you can see here we are focusing on what we want to achieve.

Prime Numbers Imperative vs Functional (A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself.)

Prime Number in imperative style

public boolean isPrime(long number) {  
  for(long i = 2; i <= Math.sqrt(number); i++) {  
    if(number % i == 0) return false;  
  }  
  return number > 1;  
}
isPrime(9220000000000000039L) // Output: true

Again here we are focusing a lot on how (iteration, state).

Prime Number in functional style

public boolean isPrime(long number) {  
  return number > 1 &&  
    LongStream
     .rangeClosed(2, (long) Math.sqrt(number))  
     .noneMatch(index -> number % index == 0);
}
isPrime(9220000000000000039L) // Output: true

Here again we are focusing on what we want to achieve. The functional style helped us to abstract away the process of explicitly iterating over the range of numbers.

You might now think, hmmm, is this all we can have …. ? Let’s see how can we use all our cores (gain parallelism) in functional style.

public boolean isPrime(long number) {  
  return number > 1 &&  
    LongStream
    .rangeClosed(2, (long) Math.sqrt(number))
    .parallel()  
    .noneMatch(index -> number % index == 0);
}
isPrime(9220000000000000039L) // Output: true

That’s it! We just added .parallel() to the stream. You can see how library/runtime handles complexity for us.

Factorial Imperative vs Functional ( The factorial of n is the product of all positive integers less than or equal to n.)

Factorial in iterative and imperative style

public long factorial(int n) {
  long product = 1;
  for ( int i = 1; i <= n; i++ ) {
    product *= i;
  }
  return product;
}
factorial(5) // Output: 120

Factorial in iterative and functional style

public long factorial(int n) {
 return LongStream
   .rangeClosed(1, n)
   .reduce((a, b) -> a *   b)
   .getAsLong();
}
factorial(5) // Output: 120

It’s worth repeating that by abstracting the how part we can write more maintainable and scalable software.

To see all the functional goodies introduced by Java 8 check out the following Lambda Expressions, Method References and Streams guide. Continue reading Functional vs Imperative Programming. Fibonacci, Prime and Factorial in Java 8

Composing Multiple Async Results via an Applicative Builder in Java 8

A few months ago, I put out a publication where I explain in detail an abstraction I came up with named Outcome, which helped me A LOT to code without side-effects by enforcing the use of semantics. By following this simple (and yet powerful) convention, I ended up turning any kind of failure (a.k.a. Exception) into an explicit result from a function, making everything much easier to reason about. I don’t know you but I was tired of dealing with exceptions that teared everything down, so I did something about it, and to be honest, it worked really well. So before I keep going with my tales from the trenches, I really recommend going over that post. Now let’s solve some asynchronous issues by using eccentric applicative ideas, shall we?

Something wicked this way comes

Life was real good, our coding was fast-paced,  cleaner and composable as ever, but, out of the blue, we stumble upon a “missing” feature (evil laughs please): we needed to combine several asynchronous Outcome instances in a non-blocking fashion….

ohgodwhy

Excited by the idea, I got down to work. I experimented for a fair amount of time seeking for a robust and yet simple way of expressing these kind of situations; while the new ComposableFuture API turned out to be much nicer that I expected (though I still don’t understand why they decided to use names like applyAsync  or thenComposeAsync instead of map or flatMap), I always ended up with implementations too verbose and repetitive comparing to some stuff I did with Scala, but after some long “Mate” sessions, I had my “Hey! moment”: Why not using something similar to an applicative?

The problem

Suppose that we have these two asynchronous results

and a silly entity called Message

I need something that given textf and numberf it will give me back something like

//After combining textf and numberf
CompletableFuture<Outcome<Message>> message = ....

So I wrote a letter to Santa Claus:

  1. I want to asynchronously format the string returned by textf using the number returned by numberf only when both values are available, meaning that both futures completed successfully and none of the outcomes did fail. Of course, we need to be non-blocking.
  2. In case of failures, I want to collect all failures that took place during the execution of textf and/or numberf and return them to the caller, again, without blocking at all.
  3. I don’t want to be constrained by the number of values to be combined,  it must be capable of handling a fair amount of asynchronous results. Did I say without blocking? There you go…
  4. Not die during the attempt.

waaat

Applicative  builder to the rescue

If you think about it, one simple way to put what we’re trying to achieve is as follows:

// Given a String -> Given a number -> Format the message
f: String -> Integer -> Message

Checking the definition of  f, it is saying something like: “Given a String, I will return a function that takes an Integer as parameter, that when applied, will return an instance of type Message“, this way, instead of waiting for all values to be available at once, we can partially apply one value at a time, getting an actual description of the construction process of a Message instance. That sounded great.

To achieve that, it would be really awesome if we could take the construction lambda Message:new and curry it, boom!, done!, but in Java that’s impossible (to do in a generic, beautiful and concise way), so for the sake of our example, I decided to go with our beloved Builder pattern, which kinda does the job:

And here’s the WannabeApplicative<T> definition

public interface WannabeApplicative<V>
{
    V apply();
}

Disclamer: For those functional freaks out there, this is not an applicative per se, I’m aware of that, but I took some ideas from it an adapted them according to the tools that the language offered me out of the box. So, if you’re feeling curious, go check this post for a more formal example.

If you’re still with me, we could agree that we’ve done nothing too complicated so far, but now we need to express a building step, which, remember, needs to be non-blocking and capable to combine any previous failure that might have took place in other executions with potentially new ones. So, in order to do that, I came up with something as follows:

First of all, we’ve got two functional interfaces: one is Partial<B>, which represents a lazy application of a value to a builder, and the second one, MergingStage<B,V>, represents the “how” to combine both the builder and the value. Then, we’ve got a method called value that, given an instance of type CompletableFuture<Outcome<V>>, it will return an instance of type MergingStage<B,V>, and believe or not, here’s where the magic takes place. If you remember the MergingState definition, you’ll see it’s a BiFunction, where the first parameter is of type Outcome<B> and the second one is of type Outcome<V>. Now, if you follow the types, you can tell that we’ve got two things: the partial state of the building process on one side (type parameter B)  and a new value that need to be applied to the current state of the builder (type parameter V), so that, when applied, it will generate a new builder instance with the “next state in the building sequence”, which is represented by Partial<B>. Last but not least, we’ve got the stickedTo method, which basically is a (awful java) hack to stick to a specific applicative type (builder) while defining building step. For instance, having:

I can define partial value applications to any Builder instance as follows:

See that we haven’t built anything yet, we just described what we want to do with each value when the time comes, we might want to perform some validations before using the new value (here’s when Outcome plays an important role) or just use it as it is, it’s really up to us, but the main point is that we haven’t applied anything yet. In order to do so, and to finally tight up all loose ends, I came up with some other definition, which looks as follows:

Hope it’s not that overwhelming, but I’ll try to break it down as clearer as possible. In order to start specifying how you’re going to combine the whole thing together, you will start by calling begin with an instance of type WannabeApplicative<V>, which, in our case, type parameter V is equal to Builder.

FutureCompositions<Message, Builder> ab = begin(Message.applicative())

See that, after you invoke begin, you will get a new instance of FutureCompositions with a lazily evaluated partial state inside of it, making it the one and only owner of the whole building process state, and that was the ultimate goal of everything we’ve done so far, to fully gain control over when and how things will be combined. Next, we must specify the values that we want to combine, and that’s what the binding method is for:

ab.binding(textToApply)
  .binding(numberToApply);

This is how we supply our builder instance with all the values that need to be merged together along with the specification of what’s supposed to happen with each one of them, by using our previously defined Partial instances. Also see that everything’s still lazy evaluated, nothing has happened yet, but still we stacked all “steps” until we finally decide to materialize the result, which will happen when you call perform.

CompletableFuture<Outcome<Message>> message = ab.perform();

From that very moment everything will unfold,  each building stage will get evaluated, where failures could be returned and collected within an Outcome instance or simply the newly available values will be supplied to the target builder instance, one way or the other, all steps will be executed until nothing’s to be done. I will try to depict what just happened as follows

applicative

If you pay attention to the left side of the picture, you can easily see how each step gets “defined” as I showed before, following the previous “declaration” arrow direction, meaning, how you actually described the building process. Now, from the moment that you call perform, each applicative instance (remember Builder in our case) will be lazily evaluated in the opposite direction:  it will start by evaluating the last specified stage in the stack, which will then proceed to evaluate the next one and so forth up to the point where we reach the “beginning” of the building definition, where it will start to unfold o roll out evaluation each step up to the top, collecting everything  it can by using the MergingStage specification.

And this is just the beginning….

I’m sure a lot could be done to improve this idea, for example:

  • The two consecutive calls to dependingOn at CompositionSources.values() sucks, too verbose to my taste, I must do something about it.
  • I’m not quite sure to keep passing Outcome instances to a MergingStage, it would look cleaner and easier if we unwrap the values to be merged before invoking it and just return Either<Failure,V> instead – this will reduce complexity and increase flexibility on what’s supposed to happen behind the scenes.
  • Though using the Builder pattern did the job, it feels old-school, I would love to easily curry constructors, so in my to-do list is to check if jOOλ or Javaslang have something to offer on that matter.
  • Better type inference so that the any unnecessary noise gets remove from the code, for example, the stickedTo method, it really is a code smell, something that I hated from the first place. Definitely need more time to figure out an alternative way to infer the applicative type from the definition itself.

You’re more than welcome to send me any suggestions and comments you might have. Cheers and remember…..

index

@gszeliga

Sources

 

Using Java 8 Lambdas, Streams, and Aggregates

Overview

In this post, we’ll take a look at filtering and manipulating objects in a Collection using Java 8 lambdas, streams, and aggregates.  All code in this post is available in BitBucket here.

For this example we’ll create a number of objects that represent servers in our IT infrastructure.  We’ll add these objects to a List and then we’ll use lambdas, streams, and aggregates to retrieve servers from the List based on certain criteria.

Objectives

  1. Introduce the concepts of lambdas, streams, and aggregate operations.
  2. Explain the relationship between streams and pipelines.
  3. Compare and contrast aggregate operations and iterators.
  4. Demonstrate the filter, collect, forEach, mapToLong, average, and getAsDouble aggregate operations.

Lambdas

Lambdas are a new Java language feature that allows us to pass functionality or behavior into methods as parameters.  One example that illustrates the usefulness of Lambdas comes from UI coding. When a user clicks on button on a user interface, it usually causes some action to occur in the application. In this case, we really want to pass a behavior into the onClick(…) method so that the application will execute the given behavior when the button is clicked. In previous versions of Java, we accomplished this by passing an anonymous inner class (that implemented a known interface) into the method. Interfaces used in this kind of scenario usually contain only one method which defines the behavior we wish to pass into the onClick(…) method. Although this works, the syntax is unwieldy. Anonymous inner classes still work for this purpose, but the new Lambda syntax is much cleaner.

Aggregate Operations

When we use Collections to store objects in our programs, we generally need to do more than simply put the objects in the collection — we need to store, retrieve, remove, and update these objects. Aggregate operations use lambdas to perform actions on the objects in a Collection. For example, you can use aggregate operations to:

  • Print the names of all the servers in inventory from a particular manufacturer
  • Return all of the servers in inventory older than a particular age
  • Calculate and return the average age of Servers in your inventory (provided the Server object has a purchase date field)

All of these tasks can be accomplished by using aggregate operations along with pipelines and streams.  We will see examples of these operations below.

Pipelines and Streams

A pipeline is simply a sequence of aggregate operations. A stream is a sequence of items, not a data structure, that carries items from the source through the pipeline. Pipelines are composed of the following:

  1. A data source. Most commonly, this is a Collection, but it could be an array, the return from a method call, or some sort of I/O channel.
  2. Zero or more intermediate operations. For example, a Filter operation. Intermediate operations produce a new stream. A filter operation takes in a stream and then produces another stream that contains only the items matching the criteria of the filter.
  3. A terminal operation. Terminal operations return a non-stream result. This result could be a primitive type (for example, an integer), a Collection, or no result at all (for example, the operation might just print the name of each item in the stream).

Some aggregate operations (i.e. forEach) look like iterators, but they have fundamental differences:

  1. Aggregate operations use internal iteration. Your application has no control over how or when the elements are processed (there is no next() method).
  2. Aggregate operations process items from a stream, not directly from a Collection.
  3. Aggregate operations support Lambda expressions as parameters.

Lambda Syntax

Now that we have discussed the concepts related to Lambda expressions, it is time to look at their syntax. You can think of Lambda expressions as anonymous methods because they have no name. Lambda syntax consists of the following:

  • A comma-separated list of formal parameters enclosed in parentheses. Data types of parameters can be omitted in Lambda expressions. The parentheses can be omitted if there is only one formal parameter.
  • The arrow token: ->
  • A body consisting of a single expression or code block.

Using Lambdas, Streams, and Aggregate Operations

As mentioned in the overview, we’ll demonstrate the use of lambdas, streams, and aggregates by filtering and retrieving Server objects from a List.  We’ll look at four examples:

  1. Finding and printing the names of all the servers from a particular manufacturer.
  2. Finding and printing the names of all of the servers older than a certain number of years.
  3. Finding and extracting into a new List all of the servers older than a certain number of years and then printing the names of the servers in the new list.
  4. Calculating and displaying the average age of the servers in the List.

Let’s get started…

The Server Class

First, we’ll look at our Server class.  The Server class will keep track of the following:

  1. Server name
  2. Server IP address
  3. Manufacturer
  4. Amount of RAM (GB)
  5. Number of processors
  6. Purchase date (LocalDate)

Notice (at line 65) that we’ve added the method getServerAge() that calculates the age of the server (in years) based on the purchase date – we’ll use this method when we calculate the average age of the Servers in our inventory.

Screen Shot 2015-12-09 at 9.57.17 AM

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Creating and Loading the Servers

Now that we have a Server class, we’ll create a List and load several servers:

Screen Shot 2015-12-10 at 10.17.47 AM

Example 1: Print the Names of All the Dell Servers

For our first example, we’ll write some code to find all of the servers made by Dell and then print the server names to the console:

Screen Shot 2015-12-10 at 10.29.48 AM

Our first step is on line 76 – we have to get the stream from our list of servers.  Once we have the stream, we add the filter intermediate operation on line 77.  The filter operation takes a stream of servers as input and then produces another stream of servers containing only the servers that match the criteria specified in the filter’s lambda.  We select only the servers that are made by Dell using the following lambda:
s -> s.getManufacturer().equalsIgnoreCase(manufacturer)

The variable s represents each server that is processed from the stream (remember that we don’t have to declare the type).  The right hand side of the arrow operator represents the statement we want to evaluate for each server processed.  In this case, we’ll return true if the current server’s manufacturer is Dell and false otherwise.  The resulting output stream from the filter contains only those servers made by Dell.

Finally, we add the forEach terminal operation on line 78.  The forEach operation takes a stream of servers as input and then runs the given lambda on each server in the stream.   We print the names of the Dell servers to the console using the following lambda:
server -> System.out.println(server.getName())

Note that we used s as the variable name for each server in the stream in the first lambda and server as the variable name in the second – they don’t have to match from one lambda to the next.

The output of the above code is what we expect:

Screen Shot 2015-12-10 at 11.08.38 AM

Example 2: Print the Names of All the Servers Older Than 3 Years

Our second example is similar to the first except that we want to find the servers that are older than 3 years:

Screen Shot 2015-12-10 at 10.45.33 AM

The only difference between this example and the first is that we changed the lambda expression in our filter operation (line 89) to this:
s -> s.getServerAge() > age

The output stream from this filter contains only servers that are older than 3 years.

The output of the above code is:

Screen Shot 2015-12-10 at 11.25.27 AM

Example 3: Extract All Servers Older Than 3 Years Into a New List

Our third example is similar to the second in that we are looking for the servers that are older than three years.  The difference in this example is that we will create a new List containing only the servers that meet our criteria:

Screen Shot 2015-12-10 at 10.47.36 AM

As in the previous example, we get the stream from the List and add the filter intermediate operation to create a stream containing only those servers older than 3 years (lines 102 and 103).  Now, on line 104, we use the collect terminal operation rather than the forEach terminal operation.  The collect terminal operation takes a stream of servers as input and then puts them in the data structure specified in the parameter.  In our case, we convert the stream into a list of servers.  The resulting list is referenced by the oldServers variable declared on Line 100.

Finally, to demonstrate that we get the same set of servers in this example as the last, we print the names of all the servers in the oldServers list.  Note that, because we want all of the servers in the list, there is no intermediate filter operation.  We simply get the stream from oldServers and feed it to the forEach terminal operation.

The output is what we expect:

Screen Shot 2015-12-10 at 12.13.44 PM

Example 4: Calculate and Print the Average Age of the Servers

In our final example, we’ll calculate the average age of our servers:

Screen Shot 2015-12-10 at 10.50.02 AM

The first step is the same as our previous examples – we get the stream from our list of servers.  Next we add the mapToLong intermediate operation.  This aggregate operation takes a stream of servers as input and produces a stream of Longs as output.  The servers are mapped to Longs according to the specified lambda on Line 119 (you can also use the equivalent syntax on Line 120).  In this case, we are grabbing the age of each incoming server and putting it into the resulting stream of Longs.

Next we add the average terminal operation.  Average does exactly what you would expect – it calculates the average of all of the values in the Stream.  Terminal operations like average that return one value by combining or operating on the contents of a stream are known as reduction operations.  Other examples of reduction operations include summinmax, and count.

Finally, we add the operation getAsDouble.  This is required because average returns the type OptionalDouble.  If the incoming stream is empty, average returns an empty instance of OptionalDouble.  If this happens, calling getAsDouble will throw a NoSuchElementException, otherwise it just returns the Double value in the OptionalDouble instance.

The output of this example is:

Screen Shot 2015-12-10 at 10.35.24 PM

Conclusion

We’ve only scratched the surface as to what you can do with lambdas, streams, and aggregates.  I encourage you to grab the source code, play with it, and start to explore all the possibilities of these new Java 8 features.

 

Project Jigsaw Hands-On Guide

Project Jigsaw will bring modularization to the Java platform and according to the original plan it was going to be feature complete on the 10th of December. So here we are but where is Jigsaw?

Surely a lot happened in the last six months: The prototype came out, the looming removal of internal APIs caused quite a ruckus, the mailing list is full of critical discussions about the project’s design decisions, and JavaOne saw a series of great introductory talks by the Jigsaw team. And then Java 9 got delayed for half year due to Jigsaw.

But let’s ignore all of that for now and just focus on the code. In this post we’ll take an existing demo application and modularize it with Java 9. If you want to follow along, head over to GitHub, where all of the code can be found. The setup instructions are important to get the scripts running with Java 9. For brevity, I removed the prefix org.codefx.demo from all package, module, and folder names an this article.

The Application Before Jigsaw

Even though I do my best to ignore the whole Christmas kerfuffle, it seemed prudent to have the demo uphold the spirit of the season. So it models an advent calendar:

  • There is a calendar, which has 24 calendar sheets.
  • Each sheet knows its day of the month and contains a surprise.
  • The death march towards Christmas is symbolized by printing the sheets (and thus the surprises) to the console.

Of course the calendar needs to be created first. It can do that by itself but it needs a way to create surprises. To this end it gets handed a list of surprise factories. This is what the main method looks like:

public static void main(String[] args) {
    List<SurpriseFactory> surpriseFactories = Arrays.asList(
            new ChocolateFactory(),
            new QuoteFactory()
    );
    Calendar calendar =
        Calendar.createWithSurprises(surpriseFactories);
    System.out.println(calendar.asText());
}

The initial state of the project is by no means the best of what is possible before Jigsaw. Quite the contrary, it is a simplistic starting point. It consists of a single module (in the abstract sense, not the Jigsaw interpretation) that contains all required types:

  • “Surprise API” – Surprise and SurpriseFactory (both are interfaces)
  • “Calendar API” – Calendar and CalendarSheet to create the calendar
  • Surprises – a couple of Surprise and SurpriseFactory implementations
  • Main – to wire up and run the whole thing.

Compiling and running is straight forward (commands for Java 8):

# compile
javac -d classes/advent ${source files}
# package
jar -cfm jars/advent.jar ${manifest and compiled class files}
# run
java -jar jars/advent.jar

Entering Jigsaw Land

The next step is small but important. It changes nothing about the code or its organization but moves it into a Jigsaw module.

Modules

So what’s a module? To quote the highly recommended State of the Module System:

A module is a named, self-describing collection of code and data. Its code is organized as a set of packages containing types, i.e., Java classes and interfaces; its data includes resources and other kinds of static information.

To control how its code refers to types in other modules, a module declares which other modules it requires in order to be compiled and run. To control how code in other modules refers to types in its packages, a module declares which of those packages it exports.

So compared to a JAR a module has a name that is recognized by the JVM, declares which other modules it depends on and defines which packages are part of its public API.

Name

A module’s name can be arbitrary. But to ensure uniqueness it is recommended to stick with the inverse-URL naming schema of packages. So while this is not necessary it will often mean that the module name is a prefix of the packages it contains.

Dependencies

A module lists the other modules it depends on to compile and run. This is true for application and library modules but also for modules in the JDK itself, which was split up into about 80 of them (have a look at them with java -listmods).

Again from the design overview:

When one module depends directly upon another in the module graph then code in the first module will be able to refer to types in the second module. We therefore say that the first module reads the second or, equivalently, that the second module is readable by the first.

[…]

The module system ensures that every dependence is fulfilled by precisely one other module, that no two modules read each other, that every module reads at most one module defining a given package, and that modules defining identically-named packages do not interfere with each other.

When any of the properties is violated, the module system refuses to compile or launch the code. This is an immense improvement over the brittle classpath, where e.g. missing JARs would only be discovered at runtime, crashing the application.

It is also worth to point out that a module is only able to access another’s types if it directly depends on it. So if A depends on B, which depends on C, then A is unable to access C unless it requires it explicitly.

Exports

A module lists the packages it exports. Only public types in these packages are accessible from outside the module.

This means that public is no longer really public. A public type in a non-exported package is as hidden from the outside world as much as a non-public type in an exported package. Which is even more hidden than package-private types are today because the module system does not even allow reflective access to them. As Jigsaw is currently implemented command line flags are the only way around this.

Implementation

To be able to create a module, the project needs a module-info.java in its root source directory:

module advent {
    // no imports or exports
}

Wait, didn’t I say that we have to declare dependencies on JDK modules as well? So why didn’t we mention anything here? All Java code requires Object and that class, as well as the few others the demo uses, are part of the module java.base. So literally every Java module depends on java.base, which led the Jigsaw team to the decision to automatically require it. So we do not have to mention it explicitly.

The biggest change is the script to compile and run (commands for Java 9):

# compile (include module-info.java)
javac -d classes/advent ${source files}
# package (add module-info.class and specify main class)
jar -c \
    --file=mods/advent.jar \
    --main-class=advent.Main \
    ${compiled class files}
# run (specify a module path and simply name to module to run)
java -mp mods -m advent

We can see that compilation is almost the same – we only need to include the new module-info.java in the list of classes.

The jar command will create a so-called modular JAR, i.e. a JAR that contains a module. Unlike before we need no manifest anymore but can specify the main class directly. Note how the JAR is created in the directory mods.

Utterly different is the way the application is started. The idea is to tell Java where to find the application modules (with -mp mods, this is called the module path) and which module we would like to launch (with -m advent).

Splitting Into Modules

Now it’s time to really get to know Jigsaw and split that monolith up into separate modules.

Made-up Rationale

The “surprise API”, i.e. Surprise and SurpriseFactory, is a great success and we want to separate it from the monolith.

The factories that create the surprises turn out to be very dynamic. A lot of work is being done here, they change frequently and which factories are used differs from release to release. So we want to isolate them.

At the same time we plan to create a large Christmas application of which the calendar is only one part. So we’d like to have a separate module for that as well.

We end up with these modules:

  • surpriseSurprise and SurpriseFactory
  • calendar – the calendar, which uses the surprise API
  • factories – the SurpriseFactory implementations
  • main – the original application, now hollowed out to the class Main

Looking at their dependencies we see that surprise depends on no other module. Both calendar and factories make use of its types so they must depend on it. Finally, main uses the factories to create the calendar so it depends on both.

jigsaw-hands-on-splitting-into-modules

Implementation

The first step is to reorganize the source code. We’ll stick with the directory structure as proposed by the official quick start guide and have all of our modules in their own folders below src:

src
  - advent.calendar: the "calendar" module
      - org ...
      module-info.java
  - advent.factories: the "factories" module
      - org ...
      module-info.java
  - advent.surprise: the "surprise" module
      - org ...
      module-info.java
  - advent: the "main" module
      - org ...
      module-info.java
.gitignore
compileAndRun.sh
LICENSE
README

To keep this readable I truncated the folders below org. What’s missing are the packages and eventually the source files for each module. See it on GitHub in its full glory.

Let’s now see what those module infos have to contain and how we can compile and run the application.

surprise

There are no required clauses as surprise has no dependencies. (Except for java.base, which is always implicitly required.) It exports the package advent.surprise because that contains the two classes Surprise and SurpriseFactory.

So the module-info.java looks as follows:

module advent.surprise {
    // requires no other modules
    // publicly accessible packages
    exports advent.surprise;
}

Compiling and packaging is very similar to the previous section. It is in fact even easier because surprises contains no main class:

# compile
javac -d classes/advent.surprise ${source files}
# package
jar -c --file=mods/advent.surprise.jar ${compiled class files}

calendar

The calendar uses types from the surprise API so the module must depend on surprise. Adding requires advent.surprise to the module achieves this.

The module’s API consists of the class Calendar. For it to be publicly accessible the containing package advent.calendar must be exported. Note that CalendarSheet, private to the same package, will not be visible outside the module.

But there is an additional twist: We just made Calendar.createWithSurprises(List<SurpriseFactory>) publicly available, which exposes types from the surprise module. So unless modules reading calendar also require surprise, Jigsaw will prevent them from accessing these types, which would lead to compile and runtime errors.

Marking the requires clause as public fixes this. With it any module that depends on calendar also reads surprise. This is called implied readability.

The final module-info looks as follows:

module advent.calendar {
    // required modules
    requires public advent.surprise;
    // publicly accessible packages
    exports advent.calendar;
}

Compilation is almost like before but the dependency on surprise must of course be reflected here. For that it suffices to point the compiler to the directory mods as it contains the required module:

# compile (point to folder with required modules)
javac -mp mods \
    -d classes/advent.calendar \
    ${source files}
# package
jar -c \
    --file=mods/advent.calendar.jar \
    ${compiled class files}

factories

The factories implement SurpriseFactory so this module must depend on surprise. And since they return instances of Surprise from published methods the same line of thought as above leads to a requires public clause.

The factories can be found in the package advent.factories so that must be exported. Note that the public class AbstractSurpriseFactory, which is found in another package, is not accessible outside this module.

So we get:

module advent.factories {
    // required modules
    requires public advent.surprise;
    // publicly accessible packages
    exports advent.factories;
}

Compilation and packaging is analog to calendar.

main

Our application requires the two modules calendar and factories to compile and run. It has no API to export.

module advent {
    // required modules
    requires advent.calendar;
    requires advent.factories;
    // no exports
}

Compiling and packaging is like with last section’s single module except that the compiler needs to know where to look for the required modules:

#compile
javac -mp mods \
    -d classes/advent \
    ${source files}
# package
jar -c \
    --file=mods/advent.jar \
    --main-class=advent.Main \
    ${compiled class files}
# run
java -mp mods -m advent

Services

Jigsaw enables loose coupling by implementing the service locator pattern, where the module system itself acts as the locator. Let’s see how that goes.

Made-up Rationale

Somebody recently read a blog post about how cool loose coupling is. Then she looked at our code from above and complained about the tight relationship between main and factories. Why would main even know factories?

Because…

public static void main(String[] args) {
    List<SurpriseFactory> surpriseFactories = Arrays.asList(
            new ChocolateFactory(),
            new QuoteFactory()
    );
    Calendar calendar =
        Calendar.createWithSurprises(surpriseFactories);
    System.out.println(calendar.asText());
}

Really? Just to instantiate some implementations of a perfectly fine abstraction (the SurpriseFactory)?

And we know she’s right. Having someone else provide us with the implementations would remove the direct dependency. Even better, if said middleman would be able to find all implementations on the module path, the calendar’s surprises could easily be configured by adding or removing modules before launching.

This is indeed possible with Jigsaw. We can have a module specify that it provides implementations of an interface. Another module can express that it uses said interface and find all implementations with the ServiceLocator.

We use this opportunity to split factories into chocolate and quote and end up with these modules and dependencies:

  • surpriseSurprise and SurpriseFactory
  • calendar – the calendar, which uses the surprise API
  • chocolate – the ChocolateFactory as a service
  • quote – the QuoteFactory as a service
  • main – the application; no longer requires individual factories

jigsaw-hands-on-services

Implementation

The first step is to reorganize the source code. The only change from before is that src/advent.factories is replaced by src/advent.factory.chocolate and src/advent.factory.quote.

Lets look at the individual modules.

surprise and calendar

Both are unchanged.

chocolate and quote

Both modules are identical except for some names. Let’s look at chocolate because it’s more yummy.

As before with factories the module requires public the surprise module.

More interesting are its exports. It provides an implementation of SurpriseFactory, namely ChocolateFactory, which is specified as follows:

provides advent.surprise.SurpriseFactory
    with advent.factory.chocolate.ChocolateFactory;

Since this class is the entirety of its public API it does not need to export anything else. Hence no other export clause is necessary.

We end up with:

module advent.factory.chocolate {
    // list the required modules
    requires public advent.surprise;
    // specify which class provides which service
    provides advent.surprise.SurpriseFactory
        with advent.factory.chocolate.ChocolateFactory;
}

Compilation and packaging is straight forward:

javac -mp mods \
    -d classes/advent.factory.chocolate \
    ${source files}
jar -c \
    --file mods/advent.factory.chocolate.jar \
    ${compiled class files}

main

The most interesting part about main is how it uses the ServiceLocator to find implementation of SurpriseFactory. From its main method:

List surpriseFactories = new ArrayList<>();
ServiceLoader.load(SurpriseFactory.class)
    .forEach(surpriseFactories::add);

Our application now only requires calendar but must specify that it uses SurpriseFactory. It has no API to export.

module advent {
    // list the required modules
    requires advent.calendar;
    // list the used services
    uses advent.surprise.SurpriseFactory;
    // exports no functionality
}

Compilation and execution are like before.

And we can indeed change the surprises the calendar will eventually contain by simply removing one of the factory modules from the module path. Neat!

Summary

So that’s it. We have seen how to move a monolithic application into a single module and how we can split it up into several. We even used a service locator to decouple our application from concrete implementations of services. All of this is on GitHub so check it out to see more code!

But there is lots more to talk about! Jigsaw brings a couple of incompatibilities but also the means to solve many of them. And we haven’t talked about how reflection interacts with the module system and how to migrate external dependencies.

If these topics interest you, watch the Jigsaw tag on my blog as I will surely write about them over the coming months.

Reactive file system monitoring using Akka actors

In this article, we will discuss:

  1. File system monitoring using Java NIO.2
  2. Common pitfalls of the default Java library
  3. Design a simple thread-based file system monitor
  4. Use the above to design a reactive file system monitor using the actor model

Note: Although all the code samples here are in Scala, it can be rewritten in simple Java too. To quickly familiarize yourself with Scala syntax, here is a very short and nice Scala cheatsheet. For a more comprehensive guide to Scala for Java programmers, consult this (not needed to follow this article).

For the absolute shortest cheatsheet, the following Java code:


public void foo(int x, int y) {
  int z = x + y
  if (z == 1) {
    System.out.println(x);
  } else {
    System.out.println(y);
  }
}

is equivalent to the following Scala code:


def foo(x: Int, y: Int): Unit = {
  val z: Int = x + y
  z match {
   case 1 => println(x)
   case _ => println(y)
  }
}

All the code presented here is available under MIT license as part of the better-files library on GitHub.


Let’s say you are tasked to build a cross-platform desktop file-search engine. You quickly realize that after the initial indexing of all the files, you need to also quickly reindex any new files (or directories) that got created or updated. A naive way would be to simply rescan the entire file system every few minutes; but that would be incredibly inefficient since most operating systems expose file system notification APIs that allow the application programmer to register callbacks for changes e.g. ionotify in Linux, FSEvenets in Mac and FindFirstChangeNotification in Windows.

But now you are stuck dealing with OS-specific APIs! Thankfully, beginning Java SE 7, we have a platform independent abstraction for watching file system changes via the WatchService API. The WatchService API was developed as part of Java NIO.2, under JSR-51 and here is a “hello world” example of using it to watch a given Path:


import java.nio.file._
import java.nio.file.StandardWatchEventKinds._
import scala.collection.JavaConversions._

def watch(directory: Path): Unit = {
  // First create the service
  val service: WatchService = directory.getFileSystem.newWatchService()

  // Register the service to the path and also specify which events we want to be notified about
  directory.register(service,  ENTRY_CREATE, ENTRY_DELETE, ENTRY_MODIFY)

  while (true) {
    val key: WatchKey = service.take()  // Wait for this key to be signalled
    for {event <- key.pollEvents()} {
      // event.context() is the path to the file that got changed  
      event.kind() match {
        case ENTRY_CREATE => println(s"${event.context()} got created")
        case ENTRY_MODIFY => println(s"${event.context()} got modified")
        case ENTRY_DELETE => println(s"${event.context()} got deleted")        
        case _ => 
          // This can happen when OS discards or loses an event. 
          // See: http://docs.oracle.com/javase/8/docs/api/java/nio/file/StandardWatchEventKinds.html#OVERFLOW
          println(s"Unknown event $event happened at ${event.context()}")
      }
    }
    key.reset()  // Do not forget to do this!! See: http://stackoverflow.com/questions/20180547/
  }
}

Although the above is a good first attempt, it lacks in several aspects:

  1. Bad Design: The above code looks unnatural and you probably had to look it up on StackOverflow to get it right. Can we do better?
  2. Bad Design: The code does not do a very good job of handling errors. What happens when we encounter a file we could not open?
  3. Gotcha: The Java API only allows us to watch the directory for changes to its direct children; it does not recursively watch a directory for you.
  4. Gotcha: The Java API does not allow us to watch a single file, only a directory.
  5. Gotcha: Even if we resolve the aformentioned issues, the Java API does not automatically start watching a new child file or directory created under the root.
  6. Bad Design: The code as implemented above, exposes a blocking/polling, thread-based model. Can we use a better concurrency abstraction?

Let’s start with each of the above concerns.

  • A better interface: Here is what my ideal interface would look like:

abstract class FileMonitor(root: Path) {
  def start(): Unit
  def onCreate(path: Path): Unit
  def onModify(path: Path): Unit
  def onDelete(path: Path): Unit  
  def stop(): Unit
}

That way, I can simply write the example code as:


val watcher = new FileMonitor(myFile) {
  override def onCreate(path: Path) = println(s"$path got created")  
  override def onModify(path: Path) = println(s"$path got modified")    
  override def onDelete(path: Path) = println(s"$path got deleted")  
}
watcher.start()

Ok, let’s try to adapt the first example using a Java Thread so that we can expose “my ideal interface”:


trait FileMonitor {                               // My ideal interface
  val root: Path                                  // starting file  
  def start(): Unit                               // start the monitor 
  def onCreate(path: Path) = {}                   // on-create callback 
  def onModify(path: Path) = {}                   // on-modify callback 
  def onDelete(path: Path) = {}                   // on-delete callback 
  def onUnknownEvent(event: WatchEvent[_]) = {}   // handle lost/discarded events
  def onException(e: Throwable) = {}              // handle errors e.g. a read error
  def stop(): Unit                                // stop the monitor
}

And here is a very basic thread-based implementation:


class ThreadFileMonitor(val root: Path) extends Thread with FileMonitor {
  setDaemon(true)        // daemonize this thread
  setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler {
    override def uncaughtException(thread: Thread, exception: Throwable) = onException(exception)    
  })

  val service = root.getFileSystem.newWatchService()

  override def run() = Iterator.continually(service.take()).foreach(process)

  override def interrupt() = {
    service.close()
    super.interrupt()
  }

  override def start() = {
    watch(root)
    super.start()
  }

  protected[this] def watch(file: Path): Unit = {
    file.register(service, ENTRY_CREATE, ENTRY_DELETE, ENTRY_MODIFY)
  }

  protected[this] def process(key: WatchKey) = {
    key.pollEvents() foreach {
      case event: WatchEvent[Path] => dispatch(event.kind(), event.context())      
      case event => onUnknownEvent(event)
    }
    key.reset()
  }

  def dispatch(eventType: WatchEvent.Kind[Path], file: Path): Unit = {
    eventType match {
      case ENTRY_CREATE => onCreate(file)
      case ENTRY_MODIFY => onModify(file)
      case ENTRY_DELETE => onDelete(file)
    }
  }
}

The above looks much cleaner! Now we can watch files to our heart’s content without poring over the details of JavaDocs by simply implementing the onCreate(path), onModify(path), onDelete(path) etc.

  • Exception handling: This is already done above. onException gets called whenever we encounter an exception and the invoker can decide what to do next by implementing it.

  • Recursive watching: The Java API does not allow recursive watching of directories. We need to modify the watch(file) to recursively attach the watcher:


def watch(file: Path, recursive: Boolean = true): Unit = {
  if (Files.isDirectory(file)) {
    file.register(service, ENTRY_CREATE, ENTRY_DELETE, ENTRY_MODIFY)              
     // recursively call watch on children of this file  
     if (recursive) { 
       Files.list(file).iterator() foreach {f => watch(f, recursive)}
     } 
  }
}
  • Watching regular files: As mentioned before, the Java API can only watch directories. One hack we can do to watch single files is to set a watcher on its parent directory and only react if the event trigerred on the file itself.

override def start() = {
  if (Files.isDirectory(root)) {
    watch(root, recursive = true) 
  } else {
    watch(root.getParent, recursive = false)
  }
  super.start()
}

And, now in process(key), we make sure we react to either a directory or that file only:


def reactTo(target: Path) = Files.isDirectory(root) || (root == target)

And, we check before dispatch now:


case event: WatchEvent[Path] =>
  val target = event.context()
  if (reactTo(target)) {
    dispatch(event.kind(), target)
  }
  • Auto-watching new items: The Java API, does not auto-watch any new sub-files. We can address this by attaching the watcher ourselves in process(key) when an ENTRY_CREATE event is fired:

if (reactTo(target)) {
  if (Files.isDirectory(root) && event.kind() == ENTRY_CREATE) {
    watch(root.resolve(target))
  }
  dispatch(event.kind(), target)
}

Putting it all together, we have our final FileMonitor.scala:


class ThreadFileMonitor(val root: Path) extends Thread with FileMonitor {
  setDaemon(true) // daemonize this thread
  setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler {
    override def uncaughtException(thread: Thread, exception: Throwable) = onException(exception)    
  })

  val service = root.getFileSystem.newWatchService()

  override def run() = Iterator.continually(service.take()).foreach(process)

  override def interrupt() = {
    service.close()
    super.interrupt()
  }

  override def start() = {
    if (Files.isDirectory(root)) {
      watch(root, recursive = true) 
    } else {
      watch(root.getParent, recursive = false)
    }
    super.start()
  }

  protected[this] def watch(file: Path, recursive: Boolean = true): Unit = {
    if (Files.isDirectory(file)) {
      file.register(service, ENTRY_CREATE, ENTRY_DELETE, ENTRY_MODIFY)
      if (recursive) {
        Files.list(file).iterator() foreach {f => watch(f, recursive)}
      }  
    }
  }

  private[this] def reactTo(target: Path) = Files.isDirectory(root) || (root == target)

  protected[this] def process(key: WatchKey) = {
    key.pollEvents() foreach {
      case event: WatchEvent[Path] =>
        val target = event.context()
        if (reactTo(target)) {
          if (Files.isDirectory(root) && event.kind() == ENTRY_CREATE) {
            watch(root.resolve(target))
          }
          dispatch(event.kind(), target)
        }
      case event => onUnknownEvent(event)
    }
    key.reset()
  }

  def dispatch(eventType: WatchEvent.Kind[Path], file: Path): Unit = {
    eventType match {
      case ENTRY_CREATE => onCreate(file)
      case ENTRY_MODIFY => onModify(file)
      case ENTRY_DELETE => onDelete(file)
    }
  }
}

Now, that we have addressed all the gotchas and distanced ourselves from the intricacies of the WatchService API, we are still tightly coupled to the thread-based API.
We will use the above class to expose a different concurrency model, namely, the actor model instead to design a reactive, dynamic and resilient file-system watcher using Akka. Although the construction of Akka actors is beyond the scope of this article, we will present a very simple actor that uses the ThreadFileMonitor:


import java.nio.file.{Path, WatchEvent}

import akka.actor._

class FileWatcher(file: Path) extends ThreadFileMonitor(file) with Actor {
  import FileWatcher._

  // MultiMap from Events to registered callbacks
  protected[this] val callbacks = newMultiMap[Event, Callback]  

  // Override the dispatcher from ThreadFileMonitor to inform the actor of a new event
  override def dispatch(event: Event, file: Path) = self ! Message.NewEvent(event, file)  

  // Override the onException from the ThreadFileMonitor
  override def onException(exception: Throwable) = self ! Status.Failure(exception)

  // when actor starts, start the ThreadFileMonitor
  override def preStart() = super.start()   
  
  // before actor stops, stop the ThreadFileMonitor
  override def postStop() = super.interrupt()

  override def receive = {
    case Message.NewEvent(event, target) if callbacks contains event => 
       callbacks(event) foreach {f => f(event -> target)}

    case Message.RegisterCallback(events, callback) => 
       events foreach {event => callbacks.addBinding(event, callback)}

    case Message.RemoveCallback(event, callback) => 
       callbacks.removeBinding(event, callback)
  }
}

object FileWatcher {
  type Event = WatchEvent.Kind[Path]
  type Callback = PartialFunction[(Event, Path), Unit]

  sealed trait Message
  object Message {
    case class NewEvent(event: Event, file: Path) extends Message
    case class RegisterCallback(events: Seq[Event], callback: Callback) extends Message
    case class RemoveCallback(event: Event, callback: Callback) extends Message
  }
}

This allows us to dynamically register and remove callbacks to react to file system events:


// initialize the actor instance
val system = ActorSystem("mySystem") 
val watcher: ActorRef = system.actorOf(Props(new FileWatcher(Paths.get("/home/pathikrit"))))

// util to create a RegisterCallback message for the actor
def when(events: Event*)(callback: Callback): Message = {
  Message.RegisterCallback(events.distinct, callback)
}

// send the register callback message for create/modify events
watcher ! when(events = ENTRY_CREATE, ENTRY_MODIFY) {   
  case (ENTRY_CREATE, file) => println(s"$file got created")
  case (ENTRY_MODIFY, file) => println(s"$file got modified")
}

Full source: FileWatcher.scala


An introduction to Spark, your next REST Framework for Java

I hope you’re having a great Java Advent this year! Today we’re going to look into a refreshing, simple, nice and pragmatic framework for writing REST applications in Java. It will be so simple, it won’t even seem like Java at all.

We’re going to look into the Spark web framework. No, it’s not related to Apache Spark. Yes, it’s unfortunate that they share the same name.

I think the best way to understand this framework is to build a simple application, so we’ll build a simple service to perform mathematical operations.

We could use it like this:

spark1

Note that the service is running on localhost at port 4567 and the resource requested is “/10/add/8”.

Set up the Project Using Gradle (what’s Gradle?)

apply plugin: "java"
apply plugin: "idea"

sourceCompatibility = 1.8

repositories {
    mavenCentral()
    maven { url "https://oss.sonatype.org/content/repositories/snapshots/" }
    maven { url "https://oss.sonatype.org/content/repositories/releases/" }     
}

dependencies {
    compile "com.javaslang:javaslang:2.0.0-RC1"
    compile "com.sparkjava:spark-core:2.3"
    compile "com.google.guava:guava:19.0-rc2"
    compile "org.projectlombok:lombok:1.16.6"
    testCompile group: 'junit', name: 'junit', version: '4.+'
}

task launch(type:JavaExec) {
    main = "me.tomassetti.javaadvent.SparkService"
    classpath = sourceSets.main.runtimeClasspath
}

Now we can run:

  • ./gradlew idea to generate an IntelliJ IDEA project
  • ./gradlew test to run tests
  • ./gradlew assemble to build the project
  • ./gradlew launch to start our service

Great. Now, Let’s Meet Spark

Do you think we can write a fully functional web service that performs basic mathematical operation in less than 25 lines of Java code? No way? Well, think again:

// imports omitted

class Calculator implements Route {

    private Map<String, Function2<Long, Long, Long>> functions = ImmutableMap.of(
            "add", (a, b) -> a + b,
            "mul", (a, b) -> a * b,
            "div", (a, b) -> a / b,
            "sub", (a, b) -> a - b);

    @Override
    public Object handle(Request request, Response response) throws Exception {
        long left = Long.parseLong(request.params(":left"));
        String operatorName = request.params(":operator");
        long right = Long.parseLong(request.params(":right"));
        return functions.get(operatorName).apply(left, right);
    }
}

public class SparkService {
    public static void main(String[] args) {
        get("/:left/:operator/:right", new Calculator());
    }
}

In our main method we just say that when we get a request which contains three parts (separated by slashes) we should use the Calculator route, which is our only route. A route in Spark is the unit which takes a request, processes it, and produces a response.

Our calculator is where the magic happens. It looks in the request for the paramters “left”, “operatorName” and “right”. Left and right are parsed as long values, while the operatorName is used to find the operation. For each operation we have a Function (Function2<Long, Long>) which we then apply to our values (left and right). Cool, eh?

Function2 is an interface which comes from the Javaslang project.

You can now start the service (./gradlew launch, remember?) and play around.

The last time I checked Java was more verbose, redundant, slow… well, it is healing now.

Ok, but what about tests?

So Java can actually be quite concise, and as a Software Engineer I celebrate that for a minute or two, but shortly after I start to feel uneasy… this stuff has no tests! Worse than that, it doesn’t look testable at all. The logic is in our calculator class, but it takes a Request and produces a Response. I don’t want to instantiate a Request just to check if my Calculator works as intended. Let’s refactor a little:

class TestableCalculator implements Route {

    private Map<String, Function2<Long, Long, Long>> functions = ImmutableMap.of(
            "add", (a, b) -> a + b,
            "mul", (a, b) -> a * b,
            "div", (a, b) -> a / b,
            "sub", (a, b) -> a - b);

    public long calculate(String operatorName, long left, long right) {
        return functions.get(operatorName).apply(left, right);
    }

    @Override
    public Object handle(Request request, Response response) throws Exception {
        long left = Long.parseLong(request.params(":left"));
        String operatorName = request.params(":operator");
        long right = Long.parseLong(request.params(":right"));
        return calculate(operatorName, left, right);
    }
}

We just separate the plumbing (taking the values out of the request) from the logic and put it in its own method: calculate. Now we can test calculate.

public class TestableLogicCalculatorTest {

    @Test
    public void testLogic() {
        assertEquals(10, new TestableCalculator().calculate("add", 3, 7));
        assertEquals(-6, new TestableCalculator().calculate("sub", 7, 13));
        assertEquals(3, new TestableCalculator().calculate("mul", 3, 1));
        assertEquals(0, new TestableCalculator().calculate("div", 0, 7));
    }

    @Test(expected = ArithmeticException.class)
    public void testInvalidInputs() {
        assertEquals(0, new TestableCalculator().calculate("div", 0, 0));
    }

}

I feel better now: our tests prove that this stuff works. Sure, it will throw an exception if we try to divide by zero, but that’s how it is.

What does that mean for the user, though?

spark2

It means this: a 500. And what happens if the user tries to use an operation which does not exist?

spark3

What if the values are not proper numbers?

spark4

Ok, this doesn’t seem very professional. Let’s fix it.

Error handling, functional style

To fix two of the cases we just have to use one feature of Spark: we can match specific exceptions to specific routes. Our routes will produce a meaningful HTTP status code and a proper message.

public class SparkService {
    public static void main(String[] args) {
        exception(NumberFormatException.class, (e, req, res) -> res.status(404));
        exception(ArithmeticException.class, (e, req, res) -> {
            res.status(400);
            res.body("This does not seem like a good idea");
        });
        get("/:left/:operator/:right", new ReallyTestableCalculator());
    }
}

We have still to handle the case of a non-existent operation, and this is something we are going to do in ReallyTestableCalculator.

To do so we’ll use a typical function pattern: we’ll return an EitherAn Either is a collection which can have either a left or a right value. The left typically represents some sort of information about an error, like an error code or an error message. If nothing goes wrong the Either will contain a right value, which could be all sort of stuff. In our case we will return an Error (a class we defined) if the operation cannot be executed, otherwise we will return the result of the operation in a Long. So we will return an Either<Error, Long>.

package me.tomassetti.javaadvent.calculators;

import javaslang.Function2;
import javaslang.Tuple2;
import javaslang.collection.Map;
import javaslang.collection.HashMap;
import javaslang.control.Either;
import spark.Request;
import spark.Response;
import spark.Route;

public class ReallyTestableCalculator implements Route {
    
    private static final int NOT_FOUND = 404;

    private Map<String, Function2<Long, Long, Long>> functions = HashMap.ofAll(
            new Tuple2<>("add", (a, b) -> a + b),
            new Tuple2<>("mul", (a, b) -> a * b),
            new Tuple2<>("div", (a, b) -> a / b),
            new Tuple2<>("sub", (a, b) -> a - b));

    public Either<Error, Long> calculate(String operatorName, long left, long right) {
        Either<Error, Long> unknownOp = Either.<Error, Long>left(new Error(NOT_FOUND, "Unknown math operation"));
        return functions.get(operatorName).map(f -> Either.<Error, Long>right(f.apply(left, right)))
                .orElse(unknownOp);
    }

    @Override
    public Object handle(Request request, Response response) throws Exception {
        long left = Long.parseLong(request.params(":left"));
        String operatorName = request.params(":operator");
        long right = Long.parseLong(request.params(":right"));
        Either<Error, Long> res =  calculate(operatorName, left, right);
        if (res.isRight()) {
            return res.get();
        } else {
            response.status(res.left().get().getHttpCode());
            return null;
        }
    }
}

Let’s test this:

package me.tomassetti.javaadvent;

import javaslang.control.Either;
import me.tomassetti.javaadvent.calculators.ReallyTestableCalculator;
import org.junit.Test;

import static org.junit.Assert.assertEquals;

public class ReallyTestableLogicCalculatorTest {

    @Test
    public void testLogic() {
        assertEquals(Either.right(10L), new ReallyTestableCalculator().calculate("add", 3, 7));
        assertEquals(Either.right(-6L), new ReallyTestableCalculator().calculate("sub", 7, 13));
        assertEquals(Either.right(3L), new ReallyTestableCalculator().calculate("mul", 3, 1));
        assertEquals(Either.right(0L), new ReallyTestableCalculator().calculate("div", 0, 7));
    }

    @Test(expected = ArithmeticException.class)
    public void testInvalidOperation() {
        Either<me.tomassetti.javaadvent.calculators.Error, Long> res = new ReallyTestableCalculator().calculate("div", 0, 0);
        assertEquals(true, res.isLeft());
        assertEquals(400, res.left().get().getHttpCode());
    }

    @Test
    public void testUnknownOperation() {
        Either<me.tomassetti.javaadvent.calculators.Error, Long> res = new ReallyTestableCalculator().calculate("foo", 0, 0);
        assertEquals(true, res.isLeft());
        assertEquals(404, res.left().get().getHttpCode());
    }

}

The result

We got a service that can be easily tested. It performs mathematical operations. It supports the four basic operations, but it could be easily extended to support more. Errors are handled and the appropriate HTTP codes are used: 400 for bad inputs and 404 for unknown operations or values.

Conclusions

When I first saw Java 8 I was happy about the new features, but not very excited. However, after a few months I am seeing new frameworks come up which are based on these new features and have the potential to really change how we program in Java. Stuff like Spark and Javaslang is making the difference. I think that now Java can remain simple and solid while becoming much more agile and productive.

You can find many more tutorials either on the Spark tutorials website or on my blog tomassetti.me .

Migrating Spring App to MicroServices App on AWS

Migrating Spring App to MicroServices App on AWS

 

The company I am working for has recently gone through a migration of refactoring our code base from a monolithic application (Java Spring WAR) into a MicroServices Application hosted on the Amazon PAAS (specifically Beanstalk and CloudFront). As part of this blog post I have provided a small and simple Sales Demo application and will discuss the steps of what is required for refactoring the application so that it can be run within Beanstalk/S3/CloudFront environments.

For the purposes of this blog, I will be using a SalesTax demo application and the code can be found here (https://github.com/shannonlal/salesdemo). This site will provide users a list of products and give them the ability to create an order and apply sales tax. I have created a more detailed guide, which includes steps for creating the different services in AWS. The guide can be found at this location (https://github.com/shannonlal/salesdemo/AWS-MigrationGuide.pdf). The following is a diagram of the Spring Architecture:

FirstPicture

 

The above architecture is a pretty standard Spring architecture for most monolithic web applications. In our migration, we broke up our code and separated the backend services from the front end content JSPs(Now HTML), CSS and JS. The following is a diagram illustrating our model of how we controlled access:

SecondPicture

Amazon Web Services

I am going to start by explaining at a high-level what these different components in AWS are and how we integrate them together.

 

Route 53

Route 53 is a Domain Name Service(https://aws.amazon.com/route53/) which allows you to route traffic to different internal AWS services. In our model we used Route 53 to host our DNS servers (for example www.mycompany.com).

 

S3

Amazon S3 (https://aws.amazon.com/s3/) is a simple storage service which allows you to store content (html, css, js files in buckets in the cloud). In this demo we will be using Amazon S3 to host the static content (html, css, and JS).

 

Beanstalk

Beanstalk (https://aws.amazon.com/elasticbeanstalk/)is an application stack which will be used to host our individual services. Beanstalk has access to multiple stacks (Tomcat, PHP, Node, Ruby, Go, .Net). In this demo we will be using Beanstalk to host our different web services (as Spring WARS running on Tomcat).

 

RDS

Amazon Relational Database Service (RDS https://aws.amazon.com/rds/) will be used to host our database. We will create an RDS database and our web services will be used to connect to the database.

 

CloudFront

Amazon CloudFront is the glue that will tie all your different services together under one common URL. We will define an origin (which will correspond to our URL, defined in Route 53 www.mycompany.com). When the user hits this URL Route53 will route the traffic to CloudFront. CloudFront will host the content and push it to edge locations around the world. In CloudFront you are able to redirect traffic based on URL patterns. For example anyone coming to the default pattern (/*) can be redirected to a bucket in S3 which hosts your static content (i.e. html, css, images). If they come to say an API URL (/api/products) you can route them to a Beanstalk service in the backend.

Infrastructure Security

In our production systems we have all our web services hidden behind different VPCs and have implemented network rules to restrict access to our backend services. I do not think I will have time to address this in this blog, but will try to talk about this in my next.

 

Application Security

One major component I have not included in the Sales Demo is Spring Security. In our application, we removed our Spring Security and replaced access control using an API Gateway. I will discuss this concept briefly at the end of this blog.

 

NOTE: AWS is a very sophisticated and complex ecosystem that provides multiple ways to integrate these different services. The model I will be discussing is similar to the model which we implemented at our company.

 

SalesTax Application Overview

 

The SalesTax Demo application will look like a traditional Spring Application with one exception. The JSP pages do not follow the traditional Spring MVC model with data being passed from the controller and then the JSP pages rendering the view. Instead we are using Angular, which makes REST calls to the backend controllers and renders of the content in the browser. The reason that we are doing this is so that we can migrate our static content (html, css, js files) to S3 buckets and have our backend services run in beanstalk.

 

I have created a guide, which provides step-by-step instructions with pictures on how to setup your environment in AWS. You can find a link to the document on github at this location. The rest of the document will provide a summary of the process with references to the guide. If you would like to try this on your own AWS setup I recommend you look at the detailed guide here ( https://github.com/shannonlal/salesdemo/AWS-MigrationGuide.pdf ).

 

Migration Process

 

The following section will provide a high-level overview of the migration process. Again if you would like to try this out for yourself, I would recommend using the detailed guide.

 

Deploy Application to Beanstalk

 

The first step will be to build the application and deploy it into a beanstalk instance. To checkout the code please run the following command:

Git clone https://github.com/shannonlal/salesdemo step0

 

You can import the project into your IDE (Eclipse, NetBeans, STS, etc) or you can just build this from the command line. To build the project run the following commands:

 

mvn clean install

 

Once the WAR has been built, log into the AWS Adminstration console and deploy your WAR in a new Beanstalk Instance. For detailed instructions see the install guide

 

Configure CloudFront to point to yourBeanstalk Instance

 

Login into the Amazon Console and click on the CloudFront link. At this point you have two options:

-Use your own domain name( www.example.com)

-Use the default provided by Cloud Front(this will look something like https://xxxxxxxxxx.cloudfront.net).

If you already have your own domain name you can add it to Route 53. The following link provides detailed instructions on how to do this (http://docs.aws.amazon.com/gettingstarted/latest/swh/website-hosting-intro.html). If you do not have your own you can just create a CloudFront Origin and it will give you a url.

 

The goal of this step is to use CloudFront to map your url (either your own www.example.com or generated https://xxxxxxxxxx.cloudfront.net) to your hosted application in BeanStalk. In CloudFront you will define a Web Distribution and then for that distribution you will define an Origin.   Origins in Cloud Front represent backend services (i.e. S3 buckets to host static content or Beanstalk Applications which host your Spring Apps). Finally, you will create a Behavior that will instruct CloudFront to map all requests of a certain url pattern to a specific Beanstalk Instance. For first step we will map all requests (/) to the Beanstalk instance. In future steps will map all requests of the format (/api/) to your Beanstalk instance and the rest (/*) will go to your S3 Bucket. Below is an image of what the screen for creating a Behavior would look like.

ThirdPicture

Create RDS Postgres instance and connect to Beanstalk

 

In this step we create a publicly accessible RDS instance and then connect to it from our pgAdmin tool to create the database. The sql script and updated code can be found by pulling down the step1 branch as follows:

 

Git clone https://github.com/shannonlal/salesdemo step1

 

The sql create script can be found in the following location

src/resources/sql/ createSalesTax-DB-Postgres.sql

 

Once your database is created you can rebuild your project with maven using the following command:

mvn clean install

 

Log back into your Amazon console and redeploy your latest war file. You will also need to append environment properties to your Beanstalk instance so it knows where to find your database. This can be done by clicking on Configuration, Software Configuration, and adding them to Environment Properties

Third-A

If you reload your application you will see that it is now pulling the products from the database instance in AWS.

 

Create an S3 Bucket and deploy Static Content to it

 

In this step we are going to create an S3 bucket and will move our Static Content (html, css, images, etc) to it. To get the latest code for this we will need to pull down the latest changes from the git. Run the following command

 

 

Git clone https://github.com/shannonlal/salesdemo step2

 

Log back into the Amazon Console and click on S3. Click on Create Bucket and create a new bucket.

 

ForthPicture

Once your bucket is created, click on Properties (upper right corner) and click on Static Website Hosting to enable hosting of content. Once your S3 bucket is ready you can transfer the static content of the project to S3. The code to transfer is in the following directory:

web/build/prod/

Update Cloud Front to reflect new origins

We will need to update CloudFront to redirect the requests to their appropriate origins. The first step will be to log into CloudFront and create an Origin for your newly created bucket. Once your Origin has been created you will need to modify the Behavior so that your default Behavior () now points to your static content in S3 and your API requests (/api/) are redirected to your Elastic Beanstalk instance.  The following is a diagram of the proposed changes to CloudFront.

FifthPicture

Redeploy Application

Once CloudFront has been updated and the status has changed to deployed, your static content, which is hosted in S3, will now be accessible by your CloudFront url. The only thing left to do is rebuild the sales demo application and redeploy it into Beanstalk. At this stage, all the front end code (html, js, css) has been moved to the web directory and the backend functionality is in the services directory. To rebuild your application run the maven command in services directory

 

mvn clean install

 

Log back into the Amazon Console and redeploy your Beanstalk application with the new WAR.

The above architecture is a good starting point for anyone who is looking at migrating their Spring application to a cloud based MicroServices. As part of your migration I would suggest looking at incorporating an API Gateway. There are a series of open source and commercially available API Gateways (Amazon released their API Gateway in July 2015, membrane-soa.org/, etc). The API Gateway will sit in between CloudFront and your backend services and will handle authentication and access control, and it will redirect your requests to the appropriate Beanstalk instance.   I have included a picture of the API Gateway below.

SixthPicture