Parallel stream processing in Java 8 – performance of sequential vs. parallel stream processing

Parallel processing is all around nowadays. Because of the increase of the number of cpu cores and the lower hardware cost which allows cheaper cluster-systems, parallel processing seems to be the next big thing.

Java 8 cares for this fact with the new stream API and the simplification of creating parallel processing on collections and arrays. Let’s have a look how this works.

Let’s say myList is a List of Integers, containing 500.000 Integer values. The way to sum-up this integer values in the pre-java 8 era was done using a for each loop.

for (int i :myList)
result+=i;

since java 8 we can do the same thing using streams

myList.stream().sum();

Parallelize this processing is very easy, we just use the keyword parallelStream() instead of stream or parallel() if we still have a stream.

stream_performance_image1

So

myList.parallelStream().sum()

will do the trick. So it’s easy to spread a calculation to threads and to the cpu cores available. But as we know the overhead of multithreading and parallel processing is expensive. The question is when to use parallel-streams and when a serial stream would be better related to performance.

First of all let’s have a look what happens behind the scene. The parallel stream uses the Fork/Join Framework for processing. This means that the stream-source is getting forked (splitted) and hands over to the fork/join-pool workers for execution.

But here we find the first point to think about, not all stream-sources are splittable as good as others. Think about a ArrayList which internal data representation based upon an array. Splitting such a source is pretty easy, because is possible to calculate the index of the middle element and split-up the array.

If we have a LinkedList it’s more complicated to split the data elements. The implementation has to go through all elements from the first entry to find the element where the split can be done. So LinkedLists for example performs badly for parallel streams.

stream_performance_image2

This is the first fact on parallel stream performance we can retain.

S – source collection must be efficiently splittable

Splitting a collection, managing the fork and join tasks, object creation and garbage collecting is an algorithmic overhead as well. This is only worthwhile when the work that has to be done on the cpu cores is non-trivial and/or the collection is large enough. And of course we have a lot of cpu cores.

One bad example would be the calculation of the max value of 5 integer values.

IntStream.rangeClosed(1, 5).reduce( Math::max).getAsInt();

The overhead of prepare and process the data for fork/join is so huge that a serial stream is much faster here. The Math.max function is not very CPU cost intensive here and we have less data elements.

But it’s getting more and more worthwhile when the function which is executed per element is more complex – to be exact “is more cpu intensive”. Calculating the sinus of every element instead of the max value for example.

When programming a chess-game the evaluation of every chess-move is also such an example. Many evaluations can be done in parallel. And we have a large number of possible next moves.

This is perfect for parallel processing.

And this is the second fact on parallel stream performance we can retain:

N*Q – factor “number of elements * cost per element” should be large

But this also means in reverse that the collection can be smaller when the operation per element is costlier.
Or when then operation per element is not so cpu intensive than we need a very large collection with many elements so that the usage of parallel streams pays off.

This directly depends on the third fact we can retain

C – number of CPU Cores – more is better > 1 a must have

On a single core machine parallel streams always perform worse than serial streams in cause of the management overhead. The same as with companies with many project leads and one man doing the work.

More is better – unfortunately in real world this is not correct for all cases, for example when the collection is too small and the CPU cores starts up – maybe from energy safe mode – just to find out that there is nothing to do.

And to determine wether to use a parallel-stream or not there are requirements to the per-element function as well. This is not so much a matter of performance but If parallel streams work as expected.
The function have to be …

  • …independent which means that computation for each element must not rely on or impact that of any other element.
  • …non-interference which means that the function won’t modify the underlying datasource when processing.
  • …stateless.

Here we have an example of a stateful lambda function used in a parallel streams. This example is taken from the java JDK API and shows a simplified distinct() implementation.
Set seen = Collections.synchronizedSet(new HashSet());
stream.parallel().map(e -> { if (seen.add(e)) return 0; else return e; })...

So that’s lead us the to the fourth fact we can retain:

F – the per-element function has to be independent

to sum this up

stream_performance_image3

Are there some other circumstances when we should not parallelize our streams? Yes, there are.

Always think about what your per element function is doing and if this suits in the world of parallel processing. When your function is calling some synchronized functionality, then you probably would have no benefit from parallelize your stream due that your parallel streams will often wait on this synchronization barrier.

The same problem occurs when you’re calling blocking i/o operations.

stream_performance_image4

And for that matter using a I/O Based source as stream is also something known to not perform well because data is read sequential, so such a source is hardly splittable.

Short URL for this post: http://wp.me/p4nxik-2xb
This entry was posted in Did you know?, Java Basics and tagged , , , , , , , , , , , , . Bookmark the permalink.

5 Responses to Parallel stream processing in Java 8 – performance of sequential vs. parallel stream processing

  1. Arkaprovo says:

    Thanks a ton for this lesson

  2. Mikael says:

    Very clear and informative post on a complicated subject that has been misunderstood by so many developers

  3. Carsten Otto says:

    Is there a reason why you decided to include the spell checker markers in the output shown here?

  4. Simon Cantor says:

    Parallel Stream processing ??? run aCollection.stream().parallel.forEach(….). Yes processing will be asynchronous. Now count amount of threads. Yes, if you will put tasks where fork-join, required,you’ll be supported by more threads. But not in this case..

  5. Pingback: Common parallel stream pitfalls – Khiem Blog

Leave a Reply