Why Loom?

In 1998, it was amazing that the Sun Java Web Server (the precursor of Tomcat) ran each request in a separate thread, and not an OS process. It was able to serve thousands of concurrent requests this way! Nowadays, that’s not so amazing. Each thread takes up a significant amount of memory, and you can’t have millions of threads on a typical server.

That’s why the modern mantra of server-side programming is: “Never block!” Instead, you specify what should happen once the data is available.

This asynchronous programming style is great for servers, allowing them to handily support millions of concurrent requests. It isn’t so great for programmers.

Here is an asynchronous request with the HttpClient API:

HttpClient.newBuilder()
   .build()
   .sendAsync(request, HttpResponse.BodyHandlers.ofString())
   .thenAccept(response -> . . .);
   .thenApply(. . .);
   .exceptionally(. . .);

What we would normally achieve with statements is now encoded as method calls. If we loved this style of programming, we would not have statements in our programming language and merrily code in Lisp.

In JavaScript, code tagged as “async” is rewritten into method calls like the ones that you’ve just seen. But that means you can only call async methods from other async methods, and your API splits into a sync and an async part, forcing you to duplicate functionality.

Virtual Threads

Project Loom takes its guidance from languages such as Erlang and Go, attacking the root of the problem by making blocking very cheap. You run tasks in “virtual threads”, a nearly unlimited resource that is mapped into actual “carrier” threads. When a virtual thread blocks, it is “parked” and another virtual thread runs on the carrier thread. The name is supposed to remind you of virtual memory that is mapped to actual RAM.

Before you complain about the name, remember that naming is hard. The Loom team previously tried “fiber”, but it is already used elsewhere with a slightly different meaning. And “lightweight” or “new” thread might look foolish when something lighter-weight or newer comes along.

After experimenting with separate classes for OS threads and virtual threads, they ended up deciding to use a single class for both—the familiar java.lang.Thread—in order to ease migration.

Of course, good old java.lang.Thread, which has been around for 25 years, ever since Java 1.0, has its share of cruft. Awkward cancellation, thread groups, priorities, depreceated methods stop,suspend, resume. The Loom team felt that these liabilities were minor since most programmers never explicitly use the Thread API but launch tasks with an ExecutorService. (Of course, the same argument would support coming up with a cleaner virtual thread API instead.)

If you have been around for a very long time, you may remember that early versions of Java had “green threads” that were mapped to an OS thread. However, there is a crucial difference. When a green thread blocked, its carrier thread was also blocked, preventing all other green threads from making progress.

Kicking the Tires

You can download binaries of Project Loom at http://jdk.java.net/loom/. They are updated regularly.

As already mentioned, a virtual thread is an object of the Thread class. Here are three ways of producing fibers. First, there is a new factory method that constructs and starts a virtual thread:

Thread thread = Thread.startVirtualThread(runnable);
  // Note that the thread is already started

This is good for demos, tutorials or quick-and-dirty experiments in JShell.

For more customization, there is a builder API:

Thread thread = Thread.builder()
   .virtual()
   .name(taskname)
   .task(runnable)
   .build();

However, as you have been told for many years now, it is better to use an executor service than to manually construct Thread instances. The static method Executors.newVirtualThreadExecutor() provides one. (The existing executor services are not useful for virtual threads. It would be counterproductive to pool them!)

For example,

ExecutorService exec = Executors.newVirtualThreadExecutor();
exec.submit(runnable1);
exec.submit(runnable2);

As with the other factory methods in the Executors class, you can optionally supply a ThreadFactory. And the new Thread.Builder class has an easy way to provide a factory, instead of a single instance:

ThreadFactory factory = Thread.builder()
   .virtual()
   .name(taskname)
   .task(runnable)
   .factory();

ExecutorService exec = Executors.newThreadExecutor(factory);

Let’s try this out. As a first test, we just sleep in each task.

import java.util.concurrent.*;

public class Test {
   public static int DELAY = 10_000;
   public static int NTASKS = 1_000_000;

   public static void run(Object obj) {
      try {
         Thread.sleep((int) (DELAY * Math.random()));
      } catch (InterruptedException ex) {
         ex.printStackTrace();
      }
      System.out.println(obj);
   }

   public static void main(String[] args) {
      ExecutorService exec = Executors.newVirtualThreadExecutor();
      for (int i = 1; i <= NTASKS; i++) {
         String taskname = "task-" + i;
         exec.submit(() -> run(taskname));
      }
      exec.close();
   }
}

Run the program and it just works. Then try using OS threads—change to Executors.newCachedThreadPool() or Executors.newFixedThreadPool(NTASKS). The program will run out of memory; on my laptop, after about 25,000 threads.

Ok, but in practice, you don’t want to sleep, but do useful work. Consider a program adapted from one of Heinz Kabutz‘ puzzlers, The program fetches a daily image, from Dilbert or Wikimedia. It consists of classes ImageProcessor and ImageInfo. The code is an impenetrable maze of twisty passages, all alike (i.e. helper functions yielding completable futures).

With virtual threads, simply read web contents synchronously. It blocks, but we don’t care. All the complexity goes away. The control flow is simple and comprehensible.

exec.submit(() -> {
    String pageURL = info.getUrlForDate(date);
    String page = getSync(pageURL, HttpResponse.BodyHandlers.ofString());
    String imageURL = info.findImage(page).getImagePath();
    byte[] image = getSync(imageURL, HttpResponse.BodyHandlers.ofByteArray());
    info.setImageData(image);
    process(info);
    return null;
});

Here is the simplified ImageProcessor code.

Pro tip: The statement return null; makes the lambda into a Callable instead of a Runnable, so that you don’t have to catch checked exceptions 😜

Try this out with something you care about. Call web services and make database connections, without worrying about callbacks. When blocking is cheap, a whole lot of accidental complexity goes away. Of course, to use this in a web app framework, you’ll have to wait for your framework provider to run your code in virtual threads.

Structured Concurrency

In this highly recommended article (from which the images below are taken), Nathaniel Smith proposes structured forms of concurrency. Here is his central argument. Launching a task in a new thread is really no better than programming with GOTO, i.e. harmful:

new Thread(runnable).start();

.svg

When multiple threads run without coordination, it’s spaghetti code all over again. In the 1960s, structured programming replaced goto with branches, loops, and functions:

.svg

When you look at a line of code, you know how the program got there.

Structured concurrency does the same for concurrent tasks. We should know, from reading the program text, when they all finish.

.svg

That way we can control the resources that the tasks use, and we know when it is time to move on.

In Loom, the ExecutorService implements this basic construct. ExecutorService has a close method that blocks until all of its virtual threads have completed. (I used this method in the first sample program to keep main alive until all virtual threads are done. In the past, you had to call the awaitTermination method instead.)

Conveniently, ExecutorService implements the AutoCloseable interface, so that you can just use a try-with-resources statement:

try (ExecutorService exec = Executor.newVirtualThreadExecutor()) {
   for (int i = 0; i < NTASKS; i++) {
      exec.schedule(() -> run(i));
   }
} // Blocks until all threads completed

I wrote a simple web crawler as a demonstration of virtual threads—here is the Crawler class. In my first attempt, I fired off a new virtual thread for each URL in a page. If I had wanted to become Google, I could have let my crawler run forever. But I wanted to go no more than 3 hops from the starting point. With “fire and forget”, there is no way of knowing when the recursion is done.

Instead, for each page, I make a new executor service and wait for completion. That way, the whole program completes when all pages have been crawled.

This seems a lot of blocking. But in Loom, blocking is cheap, so we shouldn’t worry about that.

We are used to having one executor service as thread pool for all our tasks. But in Loom, you are encouraged to use a separate executor service for each task set.

Deadlines

When crawling the web, you are likely to encounter dead links. Reading from one should time out eventually, but it can take surprisingly long.

The standard remedy is, of course, to provide a timeout. Loom prefers deadlines to timeouts, so you specify

ExecutorService exec = Executors.newVirtualThreadExecutor().withDeadline(
   Instant.now().plus(30, ChronoUnit.SECONDS))

Why deadlines? In general, timeouts compose poorly. Suppose you have to accomplish two sequential tasks with an overall timeout of 10 seconds. You don’t want to give each of the tasks a timeout of 5 seconds. After all, if one takes 6 seconds and the other 3 seconds, you still come in under the finish line. To get the timeout of the second task, you’d have to measure the duration of the first task and subtract that from the overall timeout. With deadlines, it is much simpler. Each task gets the same deadline.

The call exec.close() blocks until all virtual threads have completed or the deadline has expired. Then it cancels any remaining virtual threads.

Structured Cancellation

In Java, you cancel a thread by invoking its interrupt method. This only sets the “interrupted” flag of the thread. Cancelation is cooperative. The thread must periodically poll the “interrupted” flag. Except, when calling a blocking operation, the thread can’t do any polling. Instead, blocking operations throw an InterruptedException if they detect that the thread was interrupted. So, the thread must also catch those exceptions. In both pathways, the thread must clean up resources and terminate. This can be tedious and error-prone, but it is well-understood. Obviously, one could imagine better ways, but Loom isn’t going there right now.

There is one important improvement. Cancellation of virtual threads is structurally sound. Suppose a virtual parent thread has spawned some child threads, and then it is canceled. Clearly the intent is also to cancel the child thread. And this is exactly what happens.

Moreover, in virtual threads, canceling a parked operation can be much faster than canceling an operation that blocks natively. In particular, canceling a virtual thread waiting on a socket is instantaneous.

Structured cancellation is not new. Consider ExecutorService.invokeAny, which has been around since Java 1.5. The method runs tasks concurrently, until one completes successfully, and then cancels all others. Contrast that with CompletableFuture.anyOf, which lets all tasks run to completion, even though all but the first result will be ignored.

Thread Locals

Thread locals are a pain point for the Project Loom implementors. A quick refresher: You use a ThreadLocal object to store a value that is associated with the current thread. When you call get, set, or remove, only the instance for the current thread is affected.

public static ThreadLocal currentUser = ...;
...
currentUser.set(userID);
// Much later, several function calls deep
int userID = currentUser.get()
// Gets the value for the current thread

A commonly cited reason for thread locals is the use of a non-threadsafe class such as SimpleDateFormat. Instead of constructing new formatter objects, you want to share one. You can’t share a global instance, but one instance per thread is safe.

This use case does not carry over well for virtual threads. If you have a million virtual threads, do you really want that many SimpleDateFormat instances? You could just switch to a threadsafe alternative such as java.time.formatter.DateTimeFormatter.

Thread locals are sometimes used for task-global state, such as an HTTP request, database connection, or transaction context. It is tedious to pass along these objects as method parameters, so you want them globally accessible. Each task has its own, so you can’t have a single global variable. Having one per thread seems a convenient solution.

Except, it really isn’t when more than one task is executed on a single thread, which is exactly what happens with a thread pool. The web is replete with dire warnings to make sure that you remove thread locals so that they don’t leak into other tasks.

What you really want is task-local variables. With Loom, that’s more promising because one task corresponds to a virtual thread.

What should happen when a task spawns other tasks? It depends. Sometimes, the child should inherit the parent locals. That’s what inheritable thread locals are for.

Except, the current implementation of inheritable thread locals is costly. All inheritable thread locals are copied to each child thread so that the child can update or remove them without affecting the parent.

What is really needed is a better mechanism for these task-inheritable variables. You can find a sketch of a possible solution here. So far, no decisions have been made.

Virtual threads allow thread locals but, by default, disallow thread local inheritance. You can override either default with the Thread.Builder methods:

disallowThreadLocals()
inheritThreadLocals()

For now, you can keep using thread locals, but you need to configure your thread factory if you use inheritable thread locals. Be aware of the memory impact if you launch vast numbers of virtual threads.

Whither CompletableFuture?

The submit methods of the ExecutorService class yield a Future. For example, you can get the value of a Callable as

Callable<Integer> callable = () -> { Thread.sleep(1000); return 42; };
Future<Integer> future = exec.submit(callable);
int result = future.get(); // Blocks until task completed

As an aside, Loom adds a join method to the Future interface tht works just like get, but throws an unchecked CancellationException or CompletionException instead of the checked InterruptedException/ExecutionException if the task was canceled or terminated with an exception.

Loom adds a submitTask method to ExecutorService that is just like submit, but it yields a CompletableFuture:

CompletableFuture<Integer> future = exec.submitTask(callable);

What’s better about a CompletableFuture? It implements the Future interface and adds these methods:

isCompletedExceptionally, to find out if the task terminated with an exception (but sadly no method for obtaining the exception object)
A very large number of methods from the CompletionStage interface for asynchronous programming
Methods to effect completion with a value or exception

I don’t know if there is much value in using the CompletionStage methods with Loom, when the whole point of Loom is to avoid the pain of chaining callbacks. And clearly we don’t care about completing a future by hand. My guess is that CompletableFuture was picked because we need the exception information, and nobody wants to introduce yet another future/promise-like class into an already messy API.

In addition to submitTask, there is a submitTasks method that is similar to the existing invokeAll method. However, it returns a list of CompletableFuture, where invokeAll returned a list of Future.

Set<Callable<Integer>> tasks = Set.of(task1, task2, task3, task4, task5);
List<CompletableFuture<Integer>> futures = exec.submitTasks(tasks);

Now you have a list of futures. How do you harvest the results? You can loop over the list and call get on each future:

var results = new ArrayList<Integer>();
for (var f : futures) results.add(f.get());

Those calls block, of course, and you could get unlucky that the first call takes the longest. But who cares—then all the other calls are instantaneous.

But what if you don’t want all results? Then you can use the new completed method from the CompletableFuture class for picking off results as they arrive. This is similar to the ExecutorCompletionService, but it uses a stream instead of a queue. For example, here we harvest the first two results:

List<Integer> results = CompletableFuture.completed(futures)
   .limit(2)
   .map(CompletableFuture::join)
   .collect(Collectors.toList());
futures.forEach(f -> f.cancel(true));

Note that we need to call join, not get, because get throws checked exceptions.

The stream pipeline blocks until the first two results have been obtained. The last line cancels the remaining tasks.

I am not sure that reusing the existing ExecutorService and CompletableFuture APIs is a winner here. I found the multitude of similar-but-different methods pretty confusing. Of course, if others feel the same way, the details of the API may well change.

Summary

Project Loom reimagines threads by giving us an essentially unlimited number of them. Since we don’t have to pool threads, each concurrent task can have its own virtual thread. And since blocking is cheap, we can express our logic with plain control flow—branches, loops, method calls, exceptions. There is no need for cumbersome callbacks or async flavors.

Because the API is based on what we know (threads, executors), the learning curve is low. On the flip side, future generations may curse us for our laziness when they have to learn the cruft of the past.

There are still important limitations. In particular, blocking on monitors (java.lang.Object locks) and local file I/O is not yet retrofitted, and it will block the carrier thread.

Keep in mind that Project Loom does not solve all concurrency woes. It does nothing for you if you have computationally intensive tasks and want to keep all processor cores busy. It doesn’t help you with user interfaces that use a single event thread. The sweet spot for Project Loom is when you have lots of tasks that spend much of their time blocking.

Take Project Loom out for a spin and see how it works with your applications and frameworks, and provide feedback on the API and performance!

Project Loom and Structured Concurrency

Why Loom?

Virtual Threads

Kicking the Tires

Structured Concurrency

Deadlines

Structured Cancellation

Thread Locals

Whither CompletableFuture?

Summary

Author: Cay Horstmann

Like this:

Related

1 Comment

Lin Pengcheng December 4, 2020

2 Pingbacks

Leave a ReplyCancel reply

Project Loom and Structured Concurrency

Why Loom?

Virtual Threads

Kicking the Tires

Structured Concurrency

Deadlines

Structured Cancellation

Thread Locals

Whither CompletableFuture?

Summary

Author: Cay Horstmann

Share this:

Like this:

Related

1 Comment

Lin Pengcheng December 4, 2020

2 Pingbacks

Leave a ReplyCancel reply

Java, migrations argh #@! and now Large Language Models

5 cool applications you can build with Java and GraalVM

Bridging the Gap: Full-Stack Development Without the Headaches