Wednesday 16 August 2017

Extra concurrency utils on the JVM

I have a new set of concurrency utils released for taking advantage of multiple cores on the JVM. Often working at 4 or 8 cores contention is not a big deal - when operating on 32 or more cores, you can easily run into contention which will slow you down. There are ways to scale to more core counts but they generally require modification to your app structure.

At the same time there are a number of very commonly used abstractions on the JVM such as the OutputStream.

Wouldn't it be great if, rather than having to rewrite your application to take advantage of multiple threads, you could plug in a new library to take advantage of multiple cores?

This library aims to provide a solution to allow your code to scale:
https://github.com/jasonk000/concurrent

ParallelGZIPOutputStream - exactly what it says on the label, allows you to perform GZIP compression on long streams of data using multiple cores. This scales near-linearly to at least 16 cores so if you have the cores you can get 4, 8, 16x faster compression on your files.

BufferedOutputStream - the JVM BufferedOutputStream uses a synchronized{} block to ensure consistency when used from multiple threads. If you are producing a lot of data, and fast, from many threads, then synchronization overheads start to slow you down around 10 writers. This BufferedOutputStream solution will allow you to have multiple streams writing in for many more threads without impacting performance.

AsyncOutputStreamQueue - sometimes you just want a piece of the output work to happen on another thread - for example writing to IO. Ideally you might want to use something like a disruptor for this, however for something a lot simpler that you can plug in to your existing code, you could simply use this async module. Write to it, and the writes are queued and executed on another thread, which means the original writer can go about doing whatever it needs to do without being impacted by other slow activities.

CustomBlockingMpmcQueue - if you are using a standard Java executor, with small or moderate sized tasks, then concurrency on the queue loading tasks into the executor becomes an issue when scaling past 20+ cores. A small tweak on Nitsan's JCTools implementations, this allows you to use the fast MPMC queues with an Executor which allows for much snappier processing times through the queue when scaling up.

PS - it would be great to get this work upstreamed somewhere, if anyone has any suggestions.

Faster GZIP compression on the JVM

JVM compression benchmarks abound. Snappy and other compression libraries are often tagged as the go-to as they are quite a lot faster.

However, GZIP and the deflate algorithm is a very widely supported format with reasonably good performance. What if I told you you could get double the speed from gzip on Java?

Cloudflare have in fact done some excellent work tuning the open source zlib implementation, however this code is not available on the JVM, as it is a C library. The zlib implementation provided by Java is an older version, and overriding with LD_LIBRARY_PATH etc is not possible as the library is bundled.

Until now ..

I have packaged together some of the existing C and Java JNI code along with a few modifications, which allows you to call directly to an imported zlib library of your choosing! Include Cloudflare library for example and you will get more than 2x performance.

Benchmark results are excellent:
https://github.com/jasonk000/fastzlib

Benchmark                                Mode  Cnt   Score   Error  Units
BenchmarkCompressors.compressCloudflare  thrpt    5  43.760 ± 5.566  ops/s
BenchmarkCompressors.compressJvm         thrpt    5  19.533 ± 2.931  ops/s