Monday, 29 September 2014

JcTools based queues

Improving further

Following on my previous post in which I investigated the performance benefits in writing a custom executor as compared to OOTB -- comments from both Nitsan and Rudiger suggest I should try the JCTools implementation for SPMC queues (Single producer multiple consumer).

As simple as importing the jctools JARs from Maven, and a couple of lines to change the Queue implementation and away we go ....

    private final Queue<Runnable> queue = new SpmcArrayQueue((int) Math.pow(2, 18));

Without much further ado, here are the results. I've re-executed all of the numbers so they may be slightly different.



(Updated with correct 256K elements - most notable difference is that JCTools does not suffer as much from contention at ~8 threads).

LinkedTransferQueue is still an excellent candidate

I did notice that LinkedTransferQueue is still looking very respectable, so as an OOTB JVM queue it's definitely worth considering. Based on observing the system during the run, it does have a relatively high garbage cost. But, if you have memory to burn it is a very good candidate to test. In fact, it seems like the LTQ would perform much better if it didn't cause so much GC work. This is impressive given it is MPMC capable.

Actually, what I do notice when running on the LTQ is that the GC pause times recorded by the JVM are significantly longer (up to 1 sec). This is interesting and I suspect it's a result of the unbounded queue. A few combinations of capped queue lengths showed significantly reduced GC times based on the logs (<10millisec compared to 1sec), however no overall difference in runtime. I suspect there is more lurking in the dark here, for another day.

And custom is still better

The jctools SPMC queue does in fact perform significantly quicker (4-16%) than the other queues tested. As contention increased, it actually performed much better. Looks like Nitsan's work has paid off. See his video here (though I'm yet to get a chance to see it).

Perhaps it is a little like cheating, and we should pass Runnables to the Disruptor instead, but the Disruptor as written is still ~2x as fast as pushing this through than using an Executor.


3 comments:

  1. I think you meant Math.pow(2,18) rather than 2^18...

    ReplyDelete
  2. Thanks! I will re-run the tests. I'm sure I've made the same mistake elsewhere too.

    ReplyDelete