Wednesday, 16 August 2017

Extra concurrency utils on the JVM

I have a new set of concurrency utils released for taking advantage of multiple cores on the JVM. Often working at 4 or 8 cores contention is not a big deal - when operating on 32 or more cores, you can easily run into contention which will slow you down. There are ways to scale to more core counts but they generally require modification to your app structure.

At the same time there are a number of very commonly used abstractions on the JVM such as the OutputStream.

Wouldn't it be great if, rather than having to rewrite your application to take advantage of multiple threads, you could plug in a new library to take advantage of multiple cores?

This library aims to provide a solution to allow your code to scale:

ParallelGZIPOutputStream - exactly what it says on the label, allows you to perform GZIP compression on long streams of data using multiple cores. This scales near-linearly to at least 16 cores so if you have the cores you can get 4, 8, 16x faster compression on your files.

BufferedOutputStream - the JVM BufferedOutputStream uses a synchronized{} block to ensure consistency when used from multiple threads. If you are producing a lot of data, and fast, from many threads, then synchronization overheads start to slow you down around 10 writers. This BufferedOutputStream solution will allow you to have multiple streams writing in for many more threads without impacting performance.

AsyncOutputStreamQueue - sometimes you just want a piece of the output work to happen on another thread - for example writing to IO. Ideally you might want to use something like a disruptor for this, however for something a lot simpler that you can plug in to your existing code, you could simply use this async module. Write to it, and the writes are queued and executed on another thread, which means the original writer can go about doing whatever it needs to do without being impacted by other slow activities.

CustomBlockingMpmcQueue - if you are using a standard Java executor, with small or moderate sized tasks, then concurrency on the queue loading tasks into the executor becomes an issue when scaling past 20+ cores. A small tweak on Nitsan's JCTools implementations, this allows you to use the fast MPMC queues with an Executor which allows for much snappier processing times through the queue when scaling up.

PS - it would be great to get this work upstreamed somewhere, if anyone has any suggestions.

No comments:

Post a Comment