What the Heck Is Project Loom for Java?
Java has had good multi-threading and concurrency capabilities from early on in its evolution and can effectively utilize multi-threaded and multi-core CPUs. Java Development Kit (JDK) 1.1 had basic support for platform threads (or Operating System (OS) threads), and JDK 1.5 had more utilities and updates to improve concurrency and multi-threading. JDK 8 brought asynchronous programming support and more concurrency improvements. While things have continued to improve over multiple versions, there has been nothing groundbreaking in Java for the last three decades, apart from support for concurrency and multi-threading using OS threads.
Though the concurrency model in Java is powerful and flexible as a feature, it was not the easiest to use, and the developer experience hasn’t been great. This is primarily due to the shared state concurrency model used by default. One has to resort to synchronizing threads to avoid issues like data races and thread blocking. I wrote more about Java concurrency in my Concurrency in modern programming languages: Java post.
What is Project Loom?
Project Loom aims to drastically reduce the effort of writing, maintaining, and observing high-throughput concurrent applications that make the best use of available hardware.
— Ron Pressler (Tech lead, Project Loom)
OS threads are at the core of Java’s concurrency model and have a very mature ecosystem around them, but they also come with some drawbacks and are expensive computationally. Let’s look at the two most common use cases for concurrency and the drawbacks of the current Java concurrency model in these cases.
One of the most common concurrency use cases is serving requests over the wire using a server. For this, the preferred approach is the thread-per-request model, where a separate thread handles each request. Throughput of such systems can be explained using Little’s law, which states that in a stable system, the average concurrency (number of requests concurrently processed by the server), L, is equal to the throughput (average rate of requests), λ, times the latency (average duration of processing each request), W. With this, you can derive that throughput equals average concurrency divided by latency (λ = L/W).
So in a thread-per-request model, the throughput will be limited by the number of OS threads available, which depends on the number of physical cores/threads available on the hardware. To work around this, you have to use shared thread pools or asynchronous concurrency, both of which have their drawbacks. Thread pools have many limitations, like thread leaking, deadlocks, resource thrashing, etc. Asynchronous concurrency means you must adapt to a more complex programming style and handle data races carefully. There are also chances for memory leaks, thread locking, etc.
Another common use case is parallel processing or multi-threading, where you might split a task into subtasks across multiple threads. Here you have to write solutions to avoid data corruption and data races. In some cases, you must also ensure thread synchronization when executing a parallel task distributed over multiple threads. The implementation becomes even more fragile and puts a lot more responsibility on the developer to ensure there are no issues like thread leaks and cancellation delays.
Project Loom aims to fix these issues in the current concurrency model by introducing two new features: virtual threads and structured concurrency.
Virtual threads
Java 19 is scheduled to be released in September 2022, and Virtual threads will be a preview feature. Yayyy!
Virtual threads are lightweight threads that are not tied to OS threads but are managed by the JVM. They are suitable for thread-per-request programming styles without having the limitations of OS threads. You can create millions of virtual threads without affecting throughput. This is quite similar to coroutines, like goroutines, made famous by the Go programming language (Golang).
The new virtual threads in Java 19 will be pretty easy to use. Compare the below with Golang’s goroutines or Kotlin’s coroutines.
Virtual thread
1
2
3
Thread.startVirtualThread(() -> {
System.out.println("Hello, Project Loom!");
});
Goroutine
1
2
3
go func() {
println("Hello, Goroutines!")
}()
Kotlin coroutine
1
2
3
4
5
runBlocking {
launch {
println("Hello, Kotlin coroutines!")
}
}
Fun fact: before JDK 1.1, Java had support for green threads (aka virtual threads), but the feature was removed in JDK 1.1 as that implementation was not any better than platform threads.
The new implementation of virtual threads is done in the JVM, where it maps multiple virtual threads to one or more OS threads, and the developer can use virtual threads or platform threads as per their needs. A few other important aspects of this implementation of virtual threads:
- It is a
Thread
in code, runtime, debugger, and profiler - It’s a Java entity and not a wrapper around a native thread
- Creating and blocking them are cheap operations
- They should not be pooled
- Virtual threads use a work-stealing
ForkJoinPool
scheduler - Pluggable schedulers can be used for asynchronous programming
- A virtual thread will have its own stack memory
- The virtual threads API is very similar to platform threads and hence easier to adopt/migrate
Let’s look at some examples that show the power of virtual threads.
Total number of threads
First, let’s see how many platform threads vs. virtual threads we can create on a machine. My machine is Intel Core i9-11900H with 8 cores, 16 threads, and 64GB RAM running Fedora 36.
Platform threads
1
2
3
4
5
6
7
8
var counter = new AtomicInteger();
while (true) {
new Thread(() -> {
int count = counter.incrementAndGet();
System.out.println("Thread count = " + count);
LockSupport.park();
}).start();
}
On my machine, the code crashed after 32_539 platform threads.
Virtual threads
1
2
3
4
5
6
7
8
var counter = new AtomicInteger();
while (true) {
Thread.startVirtualThread(() -> {
int count = counter.incrementAndGet();
System.out.println("Thread count = " + count);
LockSupport.park();
});
}
On my machine, the process hung after 14_625_956 virtual threads but didn’t crash, and as memory became available, it kept going slowly. You may be wondering why this behavior! It’s due to the parked virtual threads being garbage collected, and the JVM is able to create more virtual threads and assign them to the underlying platform thread.
Task throughput
Let’s try to run 100,000 tasks using platform threads.
1
2
3
4
5
6
7
try (var executor = Executors.newThreadPerTaskExecutor(Executors.defaultThreadFactory())) {
IntStream.range(0, 100_000).forEach(i -> executor.submit(() -> {
Thread.sleep(Duration.ofSeconds(1));
System.out.println(i);
return i;
}));
}
This uses the newThreadPerTaskExecutor
with the default thread factory and thus uses a thread group. When I ran this code and timed it, I got the numbers shown here. I get better performance when I use a thread pool with Executors.newCachedThreadPool()
.
1
2
3
4
# 'newThreadPerTaskExecutor' with 'defaultThreadFactory'
0:18.77 real, 18.15 s user, 7.19 s sys, 135% 3891pu, 0 amem, 743584 mmem
# 'newCachedThreadPool' with 'defaultThreadFactory'
0:11.52 real, 13.21 s user, 4.91 s sys, 157% 6019pu, 0 amem, 2215972 mmem
Not so bad. Now, let’s do the same using virtual threads.
1
2
3
4
5
6
7
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
IntStream.range(0, 100_000).forEach(i -> executor.submit(() -> {
Thread.sleep(Duration.ofSeconds(1));
System.out.println(i);
return i;
}));
}
If I run and time it, I get the following numbers.
1
0:02.62 real, 6.83 s user, 1.46 s sys, 316% 14840pu, 0 amem, 350268 mmem
This is far more performant than using platform threads with thread pools. Of course, these are simple use cases; both thread pools and virtual thread implementations can be further optimized for better performance, but that’s not the point of this post.
Running Java Microbenchmark Harness (JMH) with the same code gives the following results, and you can see that virtual threads outperform platform threads by a huge margin.
1
2
3
4
5
6
7
8
9
10
11
# Throughput
Benchmark Mode Cnt Score Error Units
LoomBenchmark.platformThreadPerTask thrpt 5 0.362 ± 0.079 ops/s
LoomBenchmark.platformThreadPool thrpt 5 0.528 ± 0.067 ops/s
LoomBenchmark.virtualThreadPerTask thrpt 5 1.843 ± 0.093 ops/s
# Average time
Benchmark Mode Cnt Score Error Units
LoomBenchmark.platformThreadPerTask avgt 5 5.600 ± 0.768 s/op
LoomBenchmark.platformThreadPool avgt 5 3.887 ± 0.717 s/op
LoomBenchmark.virtualThreadPerTask avgt 5 1.098 ± 0.020 s/op
You can find the benchmark source code on GitHub. Here are some other meaningful benchmarks for virtual threads:
- An interesting benchmark using ApacheBench on GitHub by Elliot Barlas
- A benchmark using Akka actors on Medium by Alexander Zakusylo
- JMH benchmarks for I/O and non-I/O tasks on GitHub by Colin Cachia
Structured concurrency
Structured concurrency will be an incubator feature in Java 19.
Structured concurrency aims to simplify multi-threaded and parallel programming. It treats multiple tasks running in different threads as a single unit of work, streamlining error handling and cancellation while improving reliability and observability. This helps to avoid issues like thread leaking and cancellation delays. Being an incubator feature, this might go through further changes during stabilization.
Consider the following example using java.util.concurrent.ExecutorService
.
1
2
3
4
5
6
7
8
9
10
11
void handleOrder() throws ExecutionException, InterruptedException {
try (var esvc = new ScheduledThreadPoolExecutor(8)) {
Future<Integer> inventory = esvc.submit(() -> updateInventory());
Future<Integer> order = esvc.submit(() -> updateOrder());
int theInventory = inventory.get(); // Join updateInventory
int theOrder = order.get(); // Join updateOrder
System.out.println("Inventory " + theInventory + " updated for order " + theOrder);
}
}
We want updateInventory()
and updateOrder()
subtasks to be executed concurrently. Each of those can succeed or fail independently. Ideally, the handleOrder()
method should fail if any subtask fails. However, if a failure occurs in one subtask, things get messy.
- Imagine that
updateInventory()
fails and throws an exception. Then, thehandleOrder()
method throws an exception when callinginventory.get()
. So far this is fine, but what aboutupdateOrder()
? Since it runs on its own thread, it can complete successfully. But now we have an issue with a mismatch in inventory and order. Suppose theupdateOrder()
is an expensive operation. In that case, we are just wasting the resources for nothing, and we will have to write some sort of guard logic to revert the updates done to order as our overall operation has failed. - Imagine that
updateInventory()
is an expensive long-running operation andupdateOrder()
throws an error. ThehandleOrder()
task will be blocked oninventory.get()
even thoughupdateOrder()
threw an error. Ideally, we would like thehandleOrder()
task to cancelupdateInventory()
when a failure occurs inupdateOrder()
so that we are not wasting time. - If the thread executing
handleOrder()
is interrupted, the interruption is not propagated to the subtasks. In this caseupdateInventory()
andupdateOrder()
will leak and continue to run in the background.
For these situations, we would have to carefully write workarounds and failsafe, putting all the burden on the developer.
We can achieve the same functionality with structured concurrency using the code below.
1
2
3
4
5
6
7
8
9
10
11
12
void handleOrder() throws ExecutionException, InterruptedException {
try (var scope = new StructuredTaskScope.ShutdownOnFailure()) {
Future<Integer> inventory = scope.fork(() -> updateInventory());
Future<Integer> order = scope.fork(() -> updateOrder());
scope.join(); // Join both forks
scope.throwIfFailed(); // ... and propagate errors
// Here, both forks have succeeded, so compose their results
System.out.println("Inventory " + inventory.resultNow() + " updated for order " + order.resultNow());
}
}
Unlike the previous sample using ExecutorService
, we can now use StructuredTaskScope
to achieve the same result while confining the lifetimes of the subtasks to the lexical scope, in this case, the body of the try-with-resources statement. The code is much more readable, and the intent is also clear. StructuredTaskScope
also ensures the following behavior automatically.
Error handling with short-circuiting — If either the
updateInventory()
orupdateOrder()
fails, the other is canceled unless its already completed. This is managed by the cancellation policy implemented byShutdownOnFailure()
; other policies are possible.Cancellation propagation — If the thread running
handleOrder()
is interrupted before or during the call tojoin()
, both forks are canceled automatically when the thread exits the scope.Observability — A thread dump would clearly display the task hierarchy, with the threads running
updateInventory()
andupdateOrder()
shown as children of the scope.
State of Project Loom
The Loom project started in 2017 and has undergone many changes and proposals. Virtual threads were initially called fibers, but later on they were renamed to avoid confusion. Today with Java 19 getting closer to release, the project has delivered the two features discussed above. One as a preview and another as an incubator. Hence the path to stabilization of the features should be more precise.
What does this mean to regular Java developers?
When these features are production ready, it should not affect regular Java developers much, as these developers may be using libraries for concurrency use cases. But it can be a big deal in those rare scenarios where you are doing a lot of multi-threading without using libraries. Virtual threads could be a no-brainer replacement for all use cases where you use thread pools today. This will increase performance and scalability in most cases based on the benchmarks out there. Structured concurrency can help simplify the multi-threading or parallel processing use cases and make them less fragile and more maintainable.
What does this mean to Java library developers?
When these features are production ready, it will be a big deal for libraries and frameworks that use threads or parallelism. Library authors will see huge performance and scalability improvements while simplifying the codebase and making it more maintainable. Most Java projects using thread pools and platform threads will benefit from switching to virtual threads. Candidates include Java server software like Tomcat, Undertow, and Netty; and web frameworks like Spring and Micronaut. I expect most Java web technologies to migrate to virtual threads from thread pools. Java web technologies and trendy reactive programming libraries like RxJava and Akka could also use structured concurrency effectively. This doesn’t mean that virtual threads will be the one solution for all; there will still be use cases and benefits for asynchronous and reactive programming.
If you like this article, please leave a like or a comment.
You can follow me on Twitter and LinkedIn.
The cover image was created using a photo by Peter Herrmann on Unsplash
This post was originally published on the Okta Developer Blog.