Split work in stages using actors, performance much worse than single thread pool?

What I want to do is to test the performance of two methods:

  1. using a single thread pool and submit one big runnable contains all the work.
  2. divide work into different stages, each stage uses an actor

In test#1, I created a fixed thread pool with 40 threads, and a runnable contains 1 file io, 80ms sleep, 150ms cpu-spin. Then I submitted 10000 tasks, it costs 50s to finish all the tasks.

In test#2, I created four actors, everyone with its own dispatcher. the configuration is as below:(four actors have same configuration, I only paste one):

stage1-pool {
    type = Dispatcher
    executor = "thread-pool-executor"
    thread-pool-executor {
        fixed-pool-size = 10
    }
    throughput = 100
}
akka.actor.deployment {
    /stage1Actor {
      dispatcher = stage1-pool
      router = round-robin-pool
      nr-of-instances = 10
    }
}

stage1 actor contains 1 file-io, 20ms sleep; stage2,3,4 actors each contains 20ms sleep and 50ms cpu-spin. so you can see the overall work in test1 and test2 are the same. Then I again submitted 10000 job to this topo. and It takes sooooo long(no specific time, I shutdown it down after several mins).

Please correct me if the whole experiment is wrong, and please tell me why akka in this user case is so slow.

the test code and thread view from jProfile can be found : https://github.com/legatoo/actors-vs-single-thread-pool.

Thank you so much.

I’m not quite sure what you want to prove with the sample.

If you want to compare the throughput of two mostly equal solutions you should either rewrite the thread sample to also do the separate stages in separate scheduled tasks (which may show some of the value with actors), or the other way around, if you’d run 40 actors on a dispatcher with 40 threads in it.

If you want to write a pipeline that does 10 000 identical workloads each consisting of file-io followed by 20 ms sleep, followed by three iterations of 20 ms sleep and 50 ms cpu-spin but has better throughput than doing it sequentially in a thread, that may be possible but you’d probably have to spend some more time thinking about the problem. Just passing things more times between threads, to do more work than you have cores (unless you have 40+ cores ofc.) will not automatically make it faster.

The docs section blocking needs careful management is one thing to read and make sure you understand (given that your example does Thread.sleep).

Given that each step is mostly stateless it may make sense to look expressing it with Akka streams instead, that would also make it easier to play around with parallelizing parts and introduce asynchronous boundaries, the pipelining and parallelism section is a good read related to that.