Click here to Skip to main content
15,887,966 members
Articles / General Programming / Performance
Tip/Trick

Iteration Over Java Collections with High Performance

Rate me:
Please Sign up or sign in to vote.
2.08/5 (4 votes)
6 Jul 2018CPOL2 min read 32.9K   4   2
Travelling over Java collections is just a piece of cake, but when the size of the collections increases you have to choose wisely

Introduction

Java developers usually deal with Collections such as ArrayList, HashSet, Java 8 come with lambda and streaming API helping us to work easily with Collections. In most cases, we work with a few thousands of items and performance isn't a concern. But in strict code, when we have to travel over millions of items several times, performance will become a pain.

forEach vs C style vs Stream API

Iteration is a basic feature so that all programming languages have simple syntax to allow programmers to run through collections. Stream API can iterate over Collections in a very straightforward manner.

public List<Integer> streamSingleThread(BenchMarkState state){
    List<Integer> result = new ArrayList<>(state.testData.size());
    state.testData.stream().forEach(item -> {
        result.add(item);
    });
    return result;
}
public List<Integer> streamMultiThread(BenchMarkState state){
    List<Integer> result = new ArrayList<>(state.testData.size());
    state.testData.stream().parallel().forEach(item -> {
        result.add(item);
    });
    return result;
}

forEach loop is also simple:

public List<Integer> forEach(BenchMarkState state){
    List<Integer> result = new ArrayList<>(state.testData.size());
    for(Integer item : state.testData){
        result.add(item);
    }
    return result;
}

C style is more verbose, but still very compact:

public List<Integer> forCStyle(BenchMarkState state){
    int size = state.testData.size();
    List<Integer> result = new ArrayList<>(size);
    for(int j = 0; j < size; j ++){
        result.add(state.testData.get(j));
    }
    return result;
}

Then the performance:

Benchmark                               Mode  Cnt   Score   Error  Units
TestLoopPerformance.forCStyle           avgt  200  18.068 ± 0.074  ms/op
TestLoopPerformance.forEach             avgt  200  30.566 ± 0.165  ms/op
TestLoopPerformance.streamMultiThread   avgt  200  79.433 ± 0.747  ms/op
TestLoopPerformance.streamSingleThread  avgt  200  37.779 ± 0.485  ms/op

With C style, JVM just simply increases an integer, then reads value directly from memory, so that it is very fast. But forEach is very different. According to  answer on StackOverFlow and document from Oracle, JVM has to convert forEach to Iterator and calls hasNext() with every item, that's why forEach is much slower than C style.

Which is a High Performance Way to Travelling Over Set?

With data:

Java
@State(Scope.Benchmark)
public static class BenchMarkState {
    @Setup(Level.Trial)
    public void doSetup() {
        for(int i = 0; i < 500000; i++){
            testData.add(Integer.valueOf(i));
        }
    }
    @TearDown(Level.Trial)
    public void doTearDown() {
        testData = new HashSet<>(500000);
    }
    public Set<Integer> testData = new HashSet<>();
}

Java Set also supports Stream API and forEach loop, according to the previous test, should we wrap Set to ArrayList, then travel over ArrayList?

Java
    public List<Integer> forCStyle(BenchMarkState state){
        int size = state.testData.size();
        List<Integer> result = new ArrayList<>(size);        
        Integer[] temp = (Integer[]) state.testData.toArray(new Integer[size]);
        for(int j = 0; j < size; j ++){
            result.add(temp[j]);
        }
        return result;
    }

How about combine iterator with C style for loop?

Java
    public List<Integer> forCStyleWithIteration(BenchMarkState state){
        int size = state.testData.size();
        List<Integer> result = new ArrayList<>(size);
        Iterator<Integer> iteration = state.testData.iterator();
        for(int j = 0; j < size; j ++){
            result.add(iteration.next());
        }
        return result;
    }

Or just simple travel?

Java
    public List<Integer> forEach(BenchMarkState state){
        List<Integer> result = new ArrayList<>(state.testData.size());
        for(Integer item : state.testData) {
            result.add(item);
        }
        return result;
    }

Nice idea, but it doesn't work because initializzing new ArrayList also consumes resources.

    Benchmark                                   Mode  Cnt  Score   Error  Units
    TestLoopPerformance.forCStyle               avgt  200  6.013 ± 0.108  ms/op
    TestLoopPerformance.forCStyleWithIteration  avgt  200  4.281 ± 0.049  ms/op
    TestLoopPerformance.forEach                 avgt  200  4.498 ± 0.026  ms/op

HashMap (HashSet uses HashMap<E,Object>) isn't designed for iterating all items, the fastest way to iterate over HashMap is a combination of Iterator and C style for loop, because JVM doesn't have to call hasNext().

Conclusion

Foreach and Stream API are convenient to work with Collections, you can write code faster. However when your system is stable and performance is a major concern, you should think about rewriting your loop.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Technical Lead Orchestra Networks
Vietnam Vietnam
Java programmer at Orchestra Networks

Comments and Discussions

 
QuestionReplacing loops with StreamAPIs Pin
spidee00713-Jul-18 4:40
spidee00713-Jul-18 4:40 
AnswerRe: Replacing loops with StreamAPIs Pin
vudangngoc15-Jul-18 17:22
vudangngoc15-Jul-18 17:22 
Hi,
My test use metric average time per operation, it means smaller is better. According to results, for loop with Cstyle has better performance than StreamAPI and for each loop.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.