Apache PDFBox is a popular open-source library that facilitates java applications to work with PDF documents. Recently we encountered a Deadlock that surfaced in this library. In this post we have shared how we troubleshooted and identified the root cause of the problem.
What is Deadlock?
First lets try to understand what Deadlock means. Several technical definitions arent clear. Deadlock definition is one among them :-). Deadlocks definition goes like this: Deadlock is a situation where a set of processes are blocked because each process is holding a resource and waiting for another resource acquired by some other process.
How to troubleshoot Deadlock in an Apache opensource library
https://youtu.be/Jke2hzya4Do
Its always easier to learn something new through examples and pictures. Lets look at the below practical example, which may help you to understand Deadlock better.
Fig1: Trains starting in the same track
Fig2: Trains experiencing Deadlock
Lets say there is only one train track, and this train track has six parts(part-1, part-2, part-3, part-4, part-5, part-6). Train-A starts at part-1 and Train-B starts at Part-6 on the same train track at the same time. Both trains travel at the same speed. Under this circumstance, Train-A and Train-B will reach a Deadlock state when they reach part-3 and part-4 of the train track. Because when Train-A is in part-3 of the train track, it will be stuck waiting for part-4 of the track, which Train-B holds. On the other hand, when Train-B is in part-4, it will be stuck waiting for part-3, which Train-A holds. Thus, both the trains cant move forward. This is a classic Deadlock situation. Similarly, once a Deadlock happens in the application, it cannot be recovered. The only way to recover from Deadlock is to restart the application. To learn more about Deadlock basics & troubleshooting, you may refer to this blog post
Troubleshooting Deadlock
Now lets discuss about the Deadlock problem we faced in the application. From the above explanation you can understand that Deadlock is caused due to threads. Thus, to analyze Deadlock, you need to capture thread dump from the application. Thread dump is basically a snapshot of all threads that are running in your application. It contains information such as: stack trace, thread state, thread priority, You can capture thread dump using one of the approaches given here
Note: Most of the time, you will not know whether the actual problem in your application is deadlock or not. What you will notice is unresponsiveness from the application. Thus, its safe to capture all the prominent artifacts that are essential for troubleshooting such as: Garbage Collection log, thread dump, heap dump, netstat, iostat, you may use yCrash open source script, which would capture 360-degree data (GC log, 3 snapshots of thread dump, heap dump, netstat, iostat, vmstat, top, top -H, ) from your application stack within a minute and generate a bundle zip file.
We uploaded the captured thread dump to fastThread a thread dump analysis tool. Tool immediately pointed out that the two threads caused the deadlock. Below is the excerpt from the fastThread report.
Fig: Deadlock pointed by fastThread
Tool pointed out the stack trace of the two threads that were in deadlock. Below is the stack trace of those two threads:
"APP_Thread_50_WorkerTask_pool-5-thread-6" #1899 prio=5 os_prio=0 tid=0x00007f894c405555 nid=0x44d1 waiting for monitor entry [0x00007f88e9c44000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.fontbox.ttf.TrueTypeFont.getTable(TrueT ypeFont.java:147) - waiting to lock <0x00000002d216fec8> (a org.apache.fontbox.ttf.TrueTypeFont) at org.apache.fontbox.ttf.TrueTypeFont.getHorizontalM etrics(TrueTypeFont.java:229) at org.apache.fontbox.ttf.GlyphTable.getGlyphData(Gly phTable.java:210) at org.apache.fontbox.ttf.GlyphTable.getGlyph(GlyphTa ble.java:191) - locked <0x00000002d218ca28> (a org.apache.fontbox.ttf.RAFDataStream) at org.apache.fontbox.ttf.TrueTypeFont.getPath(TrueTy peFont.java:676) at org.apache.pdfbox.pdmodel.font.PDType1Font.getPath (PDType1Font.java:638) at org.apache.pdfbox.rendering.Type1Glyph2D.getPathFo rCharacterCode(Type1Glyph2D.java:83) at org.apache.pdfbox.rendering.PageDrawer.drawGlyph2D (PageDrawer.java:495) at org.apache.pdfbox.rendering.PageDrawer.showFontGly ph(PageDrawer.java:476) at org.apache.pdfbox.contentstream.PDFStreamEngine.sh owGlyph(PDFStreamEngine.java:787) at org.apache.pdfbox.contentstream.PDFStreamEngine.sh owGlyph(PDFStreamEngine.java:805) at org.apache.pdfbox.contentstream.PDFStreamEngine.sh owText(PDFStreamEngine.java:743) at org.apache.pdfbox.contentstream.PDFStreamEngine.sh owTextString(PDFStreamEngine.java:606) at org.apache.pdfbox.contentstream.operator.text.Show Text.process(ShowText.java:56) at org.apache.pdfbox.contentstream.PDFStreamEngine.pr ocessOperator(PDFStreamEngine.java:933) at org.apache.pdfbox.contentstream.PDFStreamEngine.pr ocessStreamOperators(PDFStreamEngine.java:514) at org.apache.pdfbox.contentstream.PDFStreamEngine.pr ocessStream(PDFStreamEngine.java:492) at org.apache.pdfbox.contentstream.PDFStreamEngine.pr ocessPage(PDFStreamEngine.java:155) at org.apache.pdfbox.rendering.PageDrawer.drawPage(Pa geDrawer.java:277) at org.apache.pdfbox.rendering.PDFRenderer.renderImag e(PDFRenderer.java:347) at org.apache.pdfbox.rendering.PDFRenderer.renderImag e(PDFRenderer.java:268) at org.apache.pdfbox.rendering.PDFRenderer.renderImag e(PDFRenderer.java:228) at xxxxxxx.export.service.thumbnail.PDFSlideGenerator ServiceImpl$1.call(PDFSlideGeneratorServiceImpl.ja va:60) at xxxxxxx.ops.executor.util.ParallelExecutorTask.doC all(ParallelExecutorTask.java:43) at xxxxxxx.ops.executor.util.ParallelExecutor.execute TaskFromQueue(ParallelExecutor.java:154) at xxxxxxx.ops.executor.util.ParallelExecutor.access$ 100(ParallelExecutor.java:26) at xxxxxxx.ops.executor.util.ParallelExecutor$WorkerT ask.doCall(ParallelExecutor.java:164) at xxxxxxx.ops.executor.util.UserContextAwareCallable .call(UserContextAwareCallable.java:89) at java.util.concurrent.FutureTask.run(FutureTask.jav a:266) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
"APP_Thread_WorkerTask_APP-pool-5-thread-5" #1898 prio=5 os_prio=0 tid=0x00007f894c898900 nid=0x44d0 waiting for monitor entry [0x00007f88e9d45000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.fontbox.ttf.TrueTypeFont.readTable(True TypeFont.java:356) - waiting to lock <0x00000002d218ca28> (a org.apache.fontbox.ttf.RAFDataStream) at org.apache.fontbox.ttf.TrueTypeFont.getTable(TrueT ypeFont.java:150) - locked <0x00000002d216fec8> (a org.apache.fontbox.ttf.TrueTypeFont) at org.apache.fontbox.ttf.TrueTypeFont.getCmap(TrueTy peFont.java:262) at org.apache.fontbox.ttf.TrueTypeFont.getUnicodeCmap Impl(TrueTypeFont.java:556) at org.apache.fontbox.ttf.TrueTypeFont.getUnicodeCmap Lookup(TrueTypeFont.java:541) at org.apache.fontbox.ttf.TrueTypeFont.nameToGID(True TypeFont.java:629) at org.apache.fontbox.ttf.TrueTypeFont.hasGlyph(TrueT ypeFont.java:698) at org.apache.pdfbox.pdmodel.font.PDType1Font.getName InFont(PDType1Font.java:601) at org.apache.pdfbox.pdmodel.font.PDType1Font.hasGlyp h(PDType1Font.java:645) at org.apache.pdfbox.rendering.Type1Glyph2D.getPathFo rCharacterCode(Type1Glyph2D.java:59) at org.apache.pdfbox.rendering.PageDrawer.drawGlyph2D (PageDrawer.java:495) at org.apache.pdfbox.rendering.PageDrawer.showFontGly ph(PageDrawer.java:476) at org.apache.pdfbox.contentstream.PDFStreamEngine.sh owGlyph(PDFStreamEngine.java:787) at org.apache.pdfbox.contentstream.PDFStreamEngine.sh owGlyph(PDFStreamEngine.java:805) at org.apache.pdfbox.contentstream.PDFStreamEngine.sh owText(PDFStreamEngine.java:743) at org.apache.pdfbox.contentstream.PDFStreamEngine.sh owTextString(PDFStreamEngine.java:606) at org.apache.pdfbox.contentstream.operator.text.Show Text.process(ShowText.java:56) at org.apache.pdfbox.contentstream.PDFStreamEngine.pr ocessOperator(PDFStreamEngine.java:933) at org.apache.pdfbox.contentstream.PDFStreamEngine.pr ocessStreamOperators(PDFStreamEngine.java:514) at org.apache.pdfbox.contentstream.PDFStreamEngine.pr ocessStream(PDFStreamEngine.java:492) at org.apache.pdfbox.contentstream.PDFStreamEngine.pr ocessPage(PDFStreamEngine.java:155) at org.apache.pdfbox.rendering.PageDrawer.drawPage(Pa geDrawer.java:277) at org.apache.pdfbox.rendering.PDFRenderer.renderImag e(PDFRenderer.java:347) at org.apache.pdfbox.rendering.PDFRenderer.renderImag e(PDFRenderer.java:268) at org.apache.pdfbox.rendering.PDFRenderer.renderImag e(PDFRenderer.java:228) at xxxxxxx.export.service.thumbnail.PDFSlideGenerator ServiceImpl$1.call(PDFSlideGeneratorServiceImpl.ja va:60) at xxxxxxx.ops.executor.util.ParallelExecutorTask.doC all(ParallelExecutorTask.java:43) at xxxxxxx.ops.executor.util.ParallelExecutor.execute TaskFromQueue(ParallelExecutor.java:154) at xxxxxxx.ops.executor.util.ParallelExecutor.access$ 100(ParallelExecutor.java:26) at xxxxxxx.ops.executor.util.ParallelExecutor$WorkerT ask.doCall(ParallelExecutor.java:164) at xxxxxxx.ops.executor.util.UserContextAwareCallable .call(UserContextAwareCallable.java:89) at java.util.concurrent.FutureTask.run(FutureTask.jav a:266) at java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
You can see the APP_Thread_50_WorkerTask_pool-5-thread-6 thread is in Deadlock with APP_Thread_50_WorkerTask_pool-5-thread-5 thread. From the stacktrace you can observe following two things:
APP_Thread_50_WorkerTask_pool-5-thread-6 has acquired the lock 0x00000002d218ca28 of the org.apache.fontbox.ttf.RAFDataStream object and waiting to acquire the lock 0x00000002d216fec8 of the org.apache.fontbox.ttf.TrueTypeFont object.
On the other hand, APP_Thread_50_WorkerTask_pool-5-thread-5 thread is trying to do the exact opposite, it acquired the lock 0x00000002d216fec8 of org.apache.fontbox.ttf.TrueTypeFont object and waiting to acquire the lock 0x00000002d218ca28 of the org.apache.fontbox.ttf.RAFDataStream object.
Indeed, its a classic Deadlock condition.
Deadlock in Apache PDFBox
If you notice two objects which are causing Deadlock are org.apache.fontbox.ttf.TrueTypeFont and org.apache.fontbox.ttf.RAFDataStream. Both of these objects are originating from the open source Apache PDFBox library.
Once seeing this bug, we searched in the Apache PDFBox bug database to see whether this problem was already reported or not. We couldnt see this problem reported earlier. Thus, we went ahead and filed a new ticket in the Apache bug database with the details. Here is the ticket that we filed, for your reference. The Apache PDFBox development team was highly responsive. They started acting on it right away and issued a fix within 2 3 days. Great job from the PDFBox team. Truly enjoyed the open source community collaboration.