A while back my machine was so messed up that I could barely ssh onto it. 3,200% CPU utilization - all 32 cores on the host were fully utilized! Compare that to my last bug where it only used 1 core, 100%
Fortunately, it was using Java 17 runtime which had thread dumps with CPU time! Sorting by CPU time brought a bunch of threads that looked similar to this:
"Thread-0" #22 [14700] prio=5 os_prio=0 cpu=10359.38ms elapsed=11.49s tid=0x000001cdc35aaf60 nid=14700 runnable [0x00000047cfffe000]
java.lang.Thread.State: RUNNABLE
at java.util.TreeMap.put(java.base@17.0.1/TreeMap.java:826)
at java.util.TreeMap.put(java.base@17.0.1/TreeMap.java:534)
at BusinessLogic.someFunction(BusinessLogic.java:29)
...
I make a quick conclusion about this and thought to look at BusinessLogic:29 based on the stack trace and it looked like this:
public void someFunction(SomeType relatedObject, List<SomeOtherType> unrelatedObjects) {
...
for (SomeOtherType unrelatedObject : unrelatedObjects) {
treeMap.put(relatedObject.a(), relatedObject.b()); // line 29
}
...
}
That code is odd.
Notice we’re iterating over unrelatedObject
, but the body of the loop only used relatedObject
.
The code can be reduced to simply:
public void someFunction(SomeType relatedObject, List<SomeOtherType> unrelatedObjects) {
...
treeMap.put(relatedObject.a(), relatedObject.b());
...
// unrelatedObjects is used later on in the function so the parameter cannot be removed
}
There must have been some refactoring in the area and unrelatedObject wasn’t used anymore.
I prepared my unit test and ran the test before and after the fix.
I tried treeMap
and unrelatedObjects
with 1,000,000 entries, well exceeding what the application ever saw and was not able to reproduce the problem.
Assuming unrelatedObjects is size N and treeMap is size M, the complexity is O(N lg(M)).
So it makes sense that I was unable to reproduce the problem.
You wouldn’t really see one minute of execution time until 100 million to 1 billion entries..
It must be something else. I was certain treeMap and unrelatedObjects never exceed 1,000 entries. It’s not even close enough to cause an issue with an O(N lg(M)) algorithm.
I wasn’t very familiar with the class so I started to look around,
focusing on treeMap
and unrelatedObjects
to see if my assumption that they
never exceed 1,000 holds. Could they be in the millions or even billions?
Then I noticed the definition of the TreeMap as :
// The field wasn't actually named treeMap.
private final Map<K,V> treeMap = new TreeMap<>();
That’s scary. There were multiple threads accessing the TreeMap, and it was unguarded. At this point, I had an aha moment. Could an unguarded TreeMap cause 3,200% utilization?
Experiment
I put together a simple experiment to try to reproduce the problem. I created a bunch of threads randomly updating a shared TreeMap.
for (int i = 0; i < numThreads; i++) {
threads.add(new Thread(() -> {
Random random = new Random();
for(int j = 0; j < numUpdates; j++) {
try {
treeMap.put(random.nextInt(1000), random.nextInt(1000));
} catch (NullPointerException e) {
// let it keep going so we can reproduce the issue.
}
}
}));
}
That try catch was key. If no try catch, I wasn’t able to reproduce the problem. Some threads would crash with the above output:
Exception in thread "Thread-0" java.lang.NullPointerException: Cannot read field "right" because "l" is null
at java.base/java.util.TreeMap.rotateRight(TreeMap.java:2562)
at java.base/java.util.TreeMap.fixAfterInsertion(TreeMap.java:2594)
at java.base/java.util.TreeMap.addEntry(TreeMap.java:770)
at java.base/java.util.TreeMap.put(TreeMap.java:828)
at java.base/java.util.TreeMap.put(TreeMap.java:534)
at SimpleRepro.lambda$main$0(SimpleRepro.java:28)
at java.base/java.lang.Thread.run(Thread.java:833)
but the program would still halt.
Probably only languages that allow you to catch NPEs will encounter this issue. Other languages will just crash. After a few runs, I saw 500% cpu utilization. That’s crazy! I always thought of race conditions as corrupting the data or deadlocking. I never though it could cause performance issues. But it makes sense, you could corrupt the data in a way that creates an infinite loop.
You can check out the java experiment from my GitHub java-by-experiments repo SimpleRepro.java
To make extra sure, I created another experiment that uses reflection
against the TreeMap to dump the nodes and their colours.
(In Java, a TreeMap is implemented using a Red-Black tree.)
The cycle in the TreeMap is highlighted in blue.
So if you’re searching or putting something not already in the map, it could get stuck in that infinite loop.
To reproduce the above graph you need to use reflection to access the root,
and the left and right fields of the TreeMap.Entry
.
You recursively traverse the TreeMap.Entry
s starting from root, keeping
track of the Entries you visited.
If you visit an entry you already visited, you know you’ve hit a loop.
Print all the nodes from visited to show the cycle.
private final TreeMap<Integer,Integer> treeMap;
private static final Field treeMapRootField;
private static final Field treeMapEntryLeft;
private static final Field treeMapEntryRight;
private static final Field treeMapEntryKey;
private static final Field treeMapEntryColor;
...
public void print() throws Exception {
print(treeMapRootField.get(treeMap), "", new IdentityHashMap<>());
}
private void print(
Object treeMapEntry, String tabs, IdentityHashMap<Object, Object> visited
) throws Exception {
if (treeMapEntry != null && !visited.containsKey(treeMapEntry)) {
// in order traversal
visited.put(treeMapEntry, treeMapEntry);
print(treeMapEntryLeft.get(treeMapEntry), tabs + " ", visited);
System.out.println(tabs + treeMapEntryKey.get(treeMapEntry) + ":"
+ (treeMapEntryColor.getBoolean(treeMapEntry) ? "BLACK" : "RED"));
print(treeMapEntryRight.get(treeMapEntry), tabs + " ", visited);
} else if (treeMapEntry != null && visited.containsKey(treeMapEntry)) {
System.out.println(tabs + treeMapEntryKey.get(treeMapEntry) + ":"
+ (treeMapEntryColor.getBoolean(treeMapEntry) ? "BLACK" : "RED")
+ " CYCLE"
);
}
}
Working code is available in my java-by-experiments GitHub project ExploringTreeMap.java and TreeMapExplorer.java
Related Work
I’m not the first person to write about this issue. While preparing this article I found:
- Ivo Anjo wrote: writing to a java treemap concurrently can lead to an infinite loop during reads
- Red Hat has a troubleshooting article: High CPU due to multiple Java threads accessing TreeMap simultaneously
- Originally reported as far back as June 20 2011 based on web archive
- There is even a Stack Overflow question about it: Java Process consumes more than 100% CPU
However, this article provides a few new perspectives not previously discussed:
- Practical code that that reproduces the issue.
- Proposing that only a particular class of languages can encounter this issue and exploring that hypothesis
- A survey of languages to see if all can reproduce the issue
- A new approach to fixing TreeMap and red black tree to prevent this issue
If any of these interest you, please keep reading.
That code is not realistic
Some might claim that my experimental code is unrealistic. Who ignores NPEs? There are two practical ways: uncaught exceptions in executors and services backed by a thread pool.
Uncaught Exceptions in Executor
We could reproduce this problem in conjunction with another common mistake that suppresses uncaught exceptions.
The below code also reproduces the issue, but with a thread pool.
final ExecutorService pool = Executors.newFixedThreadPool(numThreads);
final TreeMap<Integer,Integer> treeMap = new TreeMap<>();
Random random = new Random();
for (int i = 0; i < numThreads*numUpdatesPerThread; i++) {
pool.submit( () -> {
treeMap.put(random.nextInt(10000), random.nextInt(10000));
});
}
pool.shutdown();
pool.awaitTermination(1, TimeUnit.DAYS);
snippet from ExecutorUncaughtRepro.java
When you run it, you’ll see it hang. Take a thread dump and see the same symptoms:
"pool-1-thread-1" #22 [15356] prio=5 os_prio=0 cpu=17734.38ms elapsed=21.39s tid=0x0000023c45dd3e90 nid=15356 runnable [0x000000780b4fe000]
java.lang.Thread.State: RUNNABLE
at java.util.TreeMap.put(java.base@19.0.1/TreeMap.java:826)
at java.util.TreeMap.put(java.base@19.0.1/TreeMap.java:534)
at ExecutorUncaughtRepro.lambda$main$0(ExecutorUncaughtRepro.java:33)
at ExecutorUncaughtRepro$$Lambda$14/0x00000008010031f0.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(java.base@19.0.1/Executors.java:577)
at java.util.concurrent.FutureTask.run(java.base@19.0.1/FutureTask.java:317)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@19.0.1/ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@19.0.1/ThreadPoolExecutor.java:642)
at java.lang.Thread.run(java.base@19.0.1/Thread.java:1589)
It’s stuck on the TreeMap and really high CPU utilization.
However, when looking at standard out there is nothing! The thread pool swallowed all the NPEs not giving any indication of problem. Unfortunately, this was the case in my situation.
When managing your own thread pools, you need to make sure you:
- Register an uncaught exception handler on the thread through a thread factory
- Operate on the futures it return. If you do
future.get()
you will get anExectionException
wrapping the NPE!
Again with gRPC
Another way is to have some sort of service back by a thread pool like a gRPC service. Here I recreate a realistic scenario in gRPC that reproduces the problem. The service is powered by an unguarded TreeMap.
@Override
public void addReceipt(
ReceiptProcessorServiceOuterClass.AddReceiptRequest req,
StreamObserver<ReceiptProcessorServiceOuterClass.AddReceiptResponse> responseObserver
) {
int timestamp = req.getTimestamp();
int totalPrice = req.getTotalPrice();
receipts.put(timestamp, totalPrice);
ReceiptProcessorServiceOuterClass.AddReceiptResponse response = ReceiptProcessorServiceOuterClass.AddReceiptResponse.newBuilder().build();
responseObserver.onNext(response);
responseObserver.onCompleted();
}
snippet from GrpcRepro.java
Dumping the threads gives us:
"grpc-default-executor-23" #54 [8796] daemon prio=5 os_prio=0 cpu=18671.88ms elapsed=175.50s tid=0x00000168b6c707c0 nid=8796 runnable [0x000000059fbfe000]
java.lang.Thread.State: RUNNABLE
at java.util.TreeMap.put(java.base@19.0.1/TreeMap.java:826)
at java.util.TreeMap.put(java.base@19.0.1/TreeMap.java:534)
at ReceiptProcessorServiceImpl.addReceipt(GrpcRepro.java:59)
at ReceiptProcessorServiceGrpc$MethodHandlers.invoke(ReceiptProcessorServiceGrpc.java:185)
at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:346)
at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:860)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@19.0.1/ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@19.0.1/ThreadPoolExecutor.java:642)
at java.lang.Thread.run(java.base@19.0.1/Thread.java:1589)
showing that TreeMap.put() got stuck on an infinite loop.
How did it happen? Rotations
My suspicion is that two threads independently rotate the tree in opposite directions, resulting in a cycle. Or maybe, two overlaping rotations interleave their writes in a way that produces the cycle. However, I do not have proof of an interleaving between threads that shows this.
I also suspect that the NPE is necessary. The problem will not reproduce without hitting an NPE as I saw with my Java experiments.
Other languages
I explore this idea that only languages with NPEs can exhibit this problem by trying to reproduce this problem in as many languages as I’ve programmed in before. If this claim is true, I expect it to only reproduce in Java, Kotlin, C#, Ruby, Typescript, and Python. I expect to not be able to reproduce the problem in Go, C++, Rust, and Elixir.
Language | Affected | Explanation | Code |
---|---|---|---|
Java | yes | this whole article is based on this | source |
C# | yes | SortedDictionary used red black tree | source |
Ruby | no | Used red black tree from kanwei/algorithms but was unable to reproduce the issue. I believe it might not be able to reproduce the problem due to the Global Interpreter Lock (GIL), and how it limits when threads can context switch preventing interleavings of threads that cause an infinite loop. | source |
C++ | yes | used red-black tree. I was expecting it to always segfault first preventing it from hitting the issue. | source |
Go | yes | has popular datastructures library. Similar to C++, I was expecting it only segfault and not reproduce the infinite loop. I was wrong. | source |
Rust | no | compiler prevented me. I don’t know enough about writing unsafe code to reproduce the problem | source |
Kotlin | yes | uses java’s TreeMap, same issue expected | |
PHP | no | not in standard library and did not find any popular libraries with red black tree | |
JavaScript | no | multithreading model cannot share references | |
Typescript | no | same argument as JavaScript | |
Python | no | no red-black tree in standard library and popular libraries like Sorted Containers do not use red black tree. despite no red black tree, if I had run the experiment, I would expect to not to be able to reproduce for the same reason as ruby: the GIL. | |
Elixir | no | Although there are many third party implementations of the red black trees in Elixir (and Erlang), the programming model makes it impossible: data structures are immutable and interactions between threads are limited to message passing. |
From the table, the most interesting one is C++ because I was not expecting to be able to reproduce the problem due to my hypothesis from the last section. As a result, I include the details from the experiment here. Similarly, go as unexpected, but trim the article, I left it out. You can see Go here.
The code is essentially the same as the Java one, only using C++ standard libraries. I also added extra printing since I was so shocked that the problem reproduced, I just wanted to make extra sure.
// Equivalent of TreeMap in C++ is std::map
// Uses same red black tree data structure
std::map<int, int> sortedMap;
for (int i = 0; i < numThreads; i++) {
threads.emplace_back([&]() {
// copy so that we don't use the latest value of i (5)
int threadId = threadIdGenerator.fetch_add(1);
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(0, 999);
int progressThreshold = numUpdates / progressRatio;
std::cout << "Thread " << threadId << " started.";
for (int j = 0; j < numUpdates; j++) {
try {
int key = dis(gen);
int value = dis(gen);
sortedMap[key] = value;
} catch (const std::exception& e) {
std::cerr << "Caught exception on thread " << threadId << ":" << e.what() << std::endl;
}
if ((j + 1) % progressThreshold == 0) {
std::cout << "Thread " << threadId << ": "
<< ((j + 1) * 100 / numUpdates) << "% complete ("
<< (j + 1) << "/" << numUpdates << " updates)\n";
}
}
std::cout << "Thread " << threadId << " completed.";
});
}
Occasionally, the program segfaults and crashes like before. This is what I was expecting the worst failure to look like.
./SimpleRepro
zsh: segmentation fault ./SimpleRepro
However, very rarely the output would look like below, indicating the threads got stuck. After waiting 10 minutes, the threads never completed. I did not expect this to happen because I thought it requires a thrown and swallowed exception. That is not happening here.
./SimpleRepro
Thread 0 started.Thread Thread 0: 10% complete (Thread 10/10012Thread started.3
started.Thread 4 started.Thread 4: 10% complete (10/100 updates)
updates)
Thread 0: 20% complete (20/100 updates)
Thread 0: 30% complete (30/100 updates)
Thread 0: 40% complete (40/100 updates)
Thread 1: 10% complete (10/100 updates)
started.Thread 2: 10% complete (10/100 updates)
Thread 1: 20% complete (20/100 updates)
<nothing output for 5 minutes>
From the tool top
we can see that the problem reproduced because of the high cpu
utilization.
top
PID COMMAND %CPU TIME ...
59815 SimpleRepro 170.8 08:49.61 ...
This shows my claim is false. I was able to reproduce the problem in Go and C++. Although they sometimes segfaulted, there were times when they got stuck in the infinite loop. In a language like C++, when you reference a null pointer it always segfaults, so there must be an interleaving of the threads that does not hit a null! I have no idea why. I was expecting them to crash given how often I saw NPEs in the Java experiment.
With this knowledge, I went back to my SimpeRepro java experiment and stopped catching the NPE. After re-running it 12 times, I was still able to reproduce the problem despite not catching the NPE:
"Thread-4" #17 prio=5 os_prio=31 cpu=18597.02ms elapsed=23.79s tid=0x00007f81051ac200 nid=0x6a03 runnable [0x000070000bcb5000]
java.lang.Thread.State: RUNNABLE
at java.util.TreeMap.put(java.base@17.0.3/TreeMap.java:826)
at java.util.TreeMap.put(java.base@17.0.3/TreeMap.java:534)
at SimpleRepro.lambda$main$0(SimpleRepro.java:28)
at SimpleRepro$$Lambda$1/0x0000000800c00a08.run(Unknown Source)
at java.lang.Thread.run(java.base@17.0.3/Thread.java:833)
The other thing that surprised me was languages like Ruby, despite letting you catch NPEs did not let you reproduce the problem. The Global Interpreter Lock (GIL) somehow prevented all execution interleavings that would reproduce the problem.
Easy Fix: Guard against concurrent modification
The easiest way to fix this was to wrap the TreeMap with Collections.synchronizedMap
or switch to ConcurrentHashMap
and sort on demand.
However, those fixes are boring and did not prevent us from making that mistake and taking down our host.
Controversial Fix: Track visited nodes
What if you could safely make the mistake, but not have a cycle?
I modified the red-black tree to record the nodes it has already visited.
The only requires an additional O(lg(n)) memory since it’s limited by the height of the tree.
The height of a red-black tree is guaranteed to be O(lg(n)).
If we encounter a node that we already visited, we throw a ConcurrentModificationException
.
This will not prevent the data corruption,
but will prevent 100% cpu utilization from the infinite loop if someone makes that mistake again in the future.
This would require standard libraries to adopt this technique though which I
don’t think is likely to happen given the extra memory needed.
diff --git a/tree_map_corruption/java/src/main/java/ProtectedTreeMap.java b/tree_map_corruption/java/src/main/java/ProtectedTreeMap.java
index 53c15bb..2713d5a 100644
--- a/tree_map_corruption/java/src/main/java/ProtectedTreeMap.java
+++ b/tree_map_corruption/java/src/main/java/ProtectedTreeMap.java
@@ -345,7 +345,9 @@ public class ProtectedTreeMap<K,V>
@SuppressWarnings("unchecked")
Comparable<? super K> k = (Comparable<? super K>) key;
Entry<K,V> p = root;
+ IdentityHashMap<Entry<?,?>, Boolean> visited = new IdentityHashMap<>();
while (p != null) {
+ visited.put(p, true);
int cmp = k.compareTo(p.key);
if (cmp < 0)
p = p.left;
@@ -353,6 +355,10 @@ public class ProtectedTreeMap<K,V>
p = p.right;
else
return p;
+
+ if (visited.containsKey(p)) {
+ throw new ConcurrentModificationException("TreeMap corrupted. Loop detected");
+ }
}
return null;
}
@@ -779,6 +785,7 @@ public class ProtectedTreeMap<K,V>
}
private V put(K key, V value, boolean replaceOld) {
+ IdentityHashMap<Entry<?,?>, Boolean> visited = new IdentityHashMap<>();
Entry<K,V> t = root;
if (t == null) {
addEntryToEmptyMap(key, value);
@@ -790,6 +797,7 @@ public class ProtectedTreeMap<K,V>
Comparator<? super K> cpr = comparator;
if (cpr != null) {
do {
+ visited.put(t, true);
parent = t;
cmp = cpr.compare(key, t.key);
if (cmp < 0)
@@ -803,12 +811,17 @@ public class ProtectedTreeMap<K,V>
}
return oldValue;
}
+
+ if (visited.containsKey(t)) {
+ throw new ConcurrentModificationException("TreeMap corrupted. Loop detected");
+ }
} while (t != null);
} else {
Objects.requireNonNull(key);
@SuppressWarnings("unchecked")
Comparable<? super K> k = (Comparable<? super K>) key;
do {
+ visited.put(t, true);
parent = t;
cmp = k.compareTo(t.key);
if (cmp < 0)
@@ -822,6 +835,10 @@ public class ProtectedTreeMap<K,V>
}
return oldValue;
}
+
+ if (visited.containsKey(t)) {
+ throw new ConcurrentModificationException("TreeMap corrupted. Loop detected");
+ }
} while (t != null);
}
addEntry(key, value, parent, cmp < 0);
diff from my ProtectedTreeMap.java
In the above code, you can see that I modified both the get and put methods to
maintain the set of visited nodes and throw a ConcurrentModificationException
when an already visited node is in the set.
Mistakes Happen: Make Swiss Cheese
Mistakes happen so it’s important that multiple layers are in place to detect the issue. Imaging a slice of Swiss cheese, each slice taken from a different block of cheese. Stacking enough of those slices would result in nothing being able to pass through.
Alerts on NPEs
We lacked an alarm on any instance of an NPE. We only had error rate alarms. This NPE only occurs once per API handler worker thread before which was not enough to trigger our error rate alarms. On top of that, none of the NPEs were logged because we had an unhandled exception in our Executor.
Alerts on utilization anomalies
We monitor the cpu utilization and use a naive threshold based approach to alarming. When the cpu utilization exceeds that threshold, we fire an alarm because it indicates abnormal behaviour. This is how we discovered the issue.
Uncaught exceptions on Executors
If you add your work to an executor you must ensure you have an uncaught exception handler.
AtomicInteger threadNumber = new AtomicInteger(1);
ThreadFactory customThreadFactory = new ThreadFactory() {
@Override
public Thread newThread(Runnable r) {
Thread thread = new Thread(r);
thread.setName("my-thread-pool-" + threadNumber.getAndIncrement());
thread.setUncaughtExceptionHandler(
(dyingThread, throwable) -> {
logger.error("uncaught exception!", throwable);
}
);
return thread;
}
};
ExecutorService executor = Executors.newFixedThreadPool(2, customThreadFactory);
This code is kind of ugly so you’re better off using
Apache Common’s BasicThreadFactory:
new BasicThreadFactory.Builder()
.namingPattern("my-thread-pool-%d")
.uncaughtExceptionHandler(
(dyingThread, throwable) -> {
logger.error("uncaught exception!", throwable);
}
)
.build();
or guava’s ThreadFactoryBuilder:
new ThreadFactoryBuilder()
.setNameFormat("my-thread-pool-%d")
.setUncaughtExceptionHandler(
(dyingThread, throwable) -> {
logger.error("uncaught exception!", throwable);
}
)
.build()
Without a handler, the exception will be swallowed and none of your log monitoring will be able to detect the issue.
Code reviews
Sometimes, a detail oriented developer will notice the combination of threads and TreeMap, or even suggest to not use a TreeMap if ordered elements are not needed. Unfortunately, that didn’t happen in this case.
Static code analysis
Static code analysis tools like SpotBugs (used to be called FindBugs), JLint, and Chord can detect these kinds of issues at build time. See the paper How Good is Static Analysis at Finding Concurrency Bugs?. Unfortunately, we didn’t start using them until after the bug.
Tests
Ensure your tests handle the multithreaded case! In my case, we did not have multithreaded tests for that part of the code.
Conclusion
Mistakes happen. Watch out, unprotected concurrent modification data corruption can present itself as infinite loops! Despite making multiple mistakes, that hid the problem at multiple layers but not all. As long as you adopt a layered approach, you will be able to respond quickly or prevent it entirely.