Saturday, 31 August 2013

Caching Classes from the ClassLoader?

Code for the article is here. I also submitted an enhancement patch to Tomcat bugtracker.

I noticed previously that the Tomcat WebappClassLoader is heavily serialized. In fact, the entire loadClass entry point is marked synchronized, so for any poorly designed libraries, the impact of this on scalability is pretty remarkable. Of course, the ideal is not to hit the ClassLoader hundreds of times per second but sometimes that's out of your control.

I decided to play some more with JMH and run some trials to compare the impacts of various strategies to break the serialization.

I trialled four implementations:
1) GuavaCaching - a decorator on WebappCL which uses a Guava cache
2) ChmCaching - a decorator on WebappCL which uses a ConcurrentHashMap (no active eviction)
3) ChmWebappCL - a modified WebappCL using ConcurrentHashMap so that loadClass is only synchronized when it reaches up to parent loader, classes loaded through current loader are found in local map
4) Out of the box Tomcat 8.0.0-RC1 WebappClassLoader - synchronized fully around loadClass method

The results; in operations per microsecond, where an operation is a lookup of java.util.ArrayList and java.util.ArrayDeque.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
GuavaCaching 3687 978 1150 1129 1385 1497 1607 1679 1777 1733 1834
ChmCaching 27241 51062 81678 107376 134798 162125 188192 213007 208034 210812 200231 211744 214431 215283 209782 212297
ChmWebappCL 185 48 81 81 83 81 84 85 85 85 80 84 82 83 83 84
WebappCL 181 69 91 92 91 92 100 98 95 94 94 95 102 102 95 98

And the explanation -
  • GuavaCaching seems remarkably slow compared to CHM. Might be worth investigating further. I also noticed significant issues with Guava implementation; some tests were running for extremely long time, seems there is an issue in the dequeue (quick look, it appears stalled in a while loop).
  • ChmCaching seems very effective; although it is caching classes loaded from parent and system loader. This seems OK per the API but unusual, I will have to check the API in more detail. Scales linearly with cores (it is an 8 core machine).
  • ChmWebappCL seemed to have less of an effect. This is likely because I am testing loading classes against core java.util.* rather than from a JAR added to the classloader. I expect ChmWebappCL can approach ChmCaching speed if I attach JARs directly to the class loader rather than passing through to system loader. (Going to system loader means entering the synchronized block).
  • WebappCL - very slow performance.

And pretty pictures. You can see that CHM caching is far and away the best of this bunch.


Same picture, at log scale -


  1. In your bug post you say you think this slowness was an artifact of YourKit - the JMH test case seems to clearly show a major performance difference.

    I'm seeing a lot of concurrency backed up in the new Java Mission Control app tied in with the classloader (and I found your blog post while searching about it) - I was about to throw in a caching classloader based on one of yours and see if it fixes it...

  2. Hi Ryan - totally agree with you. TC classloader is definitely a source of contention, but I was unable to come up with a real-world scenario where it affected throughput.

    If you have enough of a test bench you can probably re-run the tests and see how it goes, I'd be keen to hear about it. I assume you are looking at Tomcat?

    What I found is that when I used the default Tomcat classloader in a real world scenario (about 2000req/sec incl business logic, db query, output transform, etc), changing the classloader did not significantly affect the req/sec rate. It only changes it when running under the profiler or under contrived examples - eg calling loadClass in a tight loop. I don't have the test bench to drive enough load onto the server - I was able to saturate the NIC before running out of CPU.

  3. I'll play with it more - I currently have a test bench that is capable of generating that load (2 machines with 10GB NICs, fast CPUs etc) - I'll play with it a bit in an A/B scenario and see if there is an improvement.

    The app has plenty of other bottlenecks in it so this one might be small in comparison - but JMC seems to indicate that the contention at the classloader could be one of those bottlenecks (it's not clear if the time that JMC reports something was blocked waiting for something are 'real times' or 'profiler times' - I'm still getting familiar with the tool).

    I have one tweak for the guava-based one - there is a "concurrencyLevel" param you can set on the cacheBuilder - I have a fork where I am trying to set it to 16 to see if it improves things in the benchmark.

    What are you passing to JMH in your tests? (I'm a JMH noob - I'm guessing "java -jar target/microbenchmarks.jar ".*" -r 10 -t 8 -tc" might do the trick?