I noticed previously that the Tomcat WebappClassLoader is heavily serialized. In fact, the entire loadClass entry point is marked synchronized, so for any poorly designed libraries, the impact of this on scalability is pretty remarkable. Of course, the ideal is not to hit the ClassLoader hundreds of times per second but sometimes that's out of your control.
I decided to play some more with JMH and run some trials to compare the impacts of various strategies to break the serialization.
I trialled four implementations:
1) GuavaCaching - a decorator on WebappCL which uses a Guava cache
2) ChmCaching - a decorator on WebappCL which uses a ConcurrentHashMap (no active eviction)
3) ChmWebappCL - a modified WebappCL using ConcurrentHashMap so that loadClass is only synchronized when it reaches up to parent loader, classes loaded through current loader are found in local map
4) Out of the box Tomcat 8.0.0-RC1 WebappClassLoader - synchronized fully around loadClass method
The results; in operations per microsecond, where an operation is a lookup of java.util.ArrayList and java.util.ArrayDeque.
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | |
GuavaCaching | 3687 | 978 | 1150 | 1129 | 1385 | 1497 | 1607 | 1679 | 1777 | 1733 | 1834 | |||||
ChmCaching | 27241 | 51062 | 81678 | 107376 | 134798 | 162125 | 188192 | 213007 | 208034 | 210812 | 200231 | 211744 | 214431 | 215283 | 209782 | 212297 |
ChmWebappCL | 185 | 48 | 81 | 81 | 83 | 81 | 84 | 85 | 85 | 85 | 80 | 84 | 82 | 83 | 83 | 84 |
WebappCL | 181 | 69 | 91 | 92 | 91 | 92 | 100 | 98 | 95 | 94 | 94 | 95 | 102 | 102 | 95 | 98 |
And the explanation -
- GuavaCaching seems remarkably slow compared to CHM. Might be worth investigating further. I also noticed significant issues with Guava implementation; some tests were running for extremely long time, seems there is an issue in the dequeue (quick look, it appears stalled in a while loop).
- ChmCaching seems very effective; although it is caching classes loaded from parent and system loader. This seems OK per the API but unusual, I will have to check the API in more detail. Scales linearly with cores (it is an 8 core machine).
- ChmWebappCL seemed to have less of an effect. This is likely because I am testing loading classes against core java.util.* rather than from a JAR added to the classloader. I expect ChmWebappCL can approach ChmCaching speed if I attach JARs directly to the class loader rather than passing through to system loader. (Going to system loader means entering the synchronized block).
- WebappCL - very slow performance.
And pretty pictures. You can see that CHM caching is far and away the best of this bunch.
.
Same picture, at log scale -