Best Practices for Scaling Akka Actors in a High Throughput System

Hello Akka community,

I’m currently working on a high-throughput system using Akka, and I’m seeking advice on best practices for scaling Akka actors in a way that ensures both performance and reliability. The system needs to process a large number of events in parallel, and I’m encountering a few challenges regarding load distribution and actor management.

Some context: we’re building a real-time data processing pipeline that involves multiple services interacting through Akka actors. The number of events per second can vary widely, and we want to ensure that the system scales horizontally as demand increases. However, we’ve run into some issues with handling high traffic, especially with regard to actor creation and distribution.

Here are a few specific questions I have:

  1. Actor Pooling: What are the best practices for pooling actors or dynamically adjusting the number of active actors based on load? Are there any pitfalls to avoid when using Router or other actor management patterns?
  2. Message Throughput: How can I effectively manage high message throughput without overwhelming individual actors? Should I consider implementing message batching or other optimizations?
  3. Cluster Scaling: In the context of an Akka cluster, what strategies would you recommend for distributing actors across nodes? How do you ensure efficient load balancing when scaling out?
  4. Monitoring and Performance Tuning: Are there any tools or techniques you would recommend for monitoring the performance of Akka actors in such a system? How can I ensure that the actors remain responsive under heavy load?

I’d appreciate any insights, resources, or personal experiences you could share. Thanks in advance for your help!

Regards
Harrypython

I’m currently working on a high-throughput system using Akka, and I’m seeking advice on best practices for scaling Akka actors in a way that ensures both performance and reliability.

It’s really hard to provide generic advice here without any details on either what you mean by “high throughput” or what functionality the system is providing. I started writing advice, and it was all too conditional. Plus, sometimes people have come to me looking for advice on “high throughput systems” wanting to do all kinds of crazy optimizations, refusing to use out of the box features for various unfounded performance reasons, only to find out that by “high throughput” they meant hundreds of transactions per second. (Which is trivial from an Akka perspective.)

A lot of the answers are going to depend on how stateless the work is, if ordering is important, how consistent the work is, and what reliability guarantees are needed. There’s a huge difference in “do a quick calculation on sensor data coming in as fast as the network can handle it, if you drop a few that’s no big deal” vs. “do complex event processing where you have to keep a bunch of state in memory and process things in order, and never miss an event” vs “handle a lot of parallel work, but it is all blocking and dependent on external systems”.

So, while I don’t want to write any advice without more information:

  • The out of the box features do what they say they do. Like most things Akka they are extremely lightweight and do what they say on the tin. So, if you just need to parallelize work, feel free to use a router. “Best practices” all have to do with your workload. Do you need guarantees? Do you need ordering? Do you need a consistent hash?
  • Speaking of which, you should probably look at Akka Streams. If you really are going to high throughput, there are features there (like fusing steps, and handling async without losing order) that can help immensely for a lot of high throughput use cases.
  • Streams also can help with some of your monitoring/performance tuning use cases. Streams has a lot of tools for monitoring high throughput use cases, like sampling.
  • Clustering is a double edged sword. Obviously anything remote is much more costly than doing something local, so you want to avoid any remote work that you can. But, on the other hand, if you are doing something like a CEP use case, using somethign like Cluster Sharding is a lot faster than any of the alternatives.
  • Like all things, especially high throughput things, you have to measure. For example, my default advice is to not worry about message batching. With how lightweight actors are, and how Akka effective “batches” automatically in its mailbox handling, and how Akka streams can fuse steps to further minimize overhead, 999 times out of a 1000 doing your own batching is probably going to just add additional overhead. But I can certainly see a few scenario where this wouldn’t be true and if you really care about performance you have to test every assumption.

I guess the one general piece of advice I’d have is that actors are lightweight and (barring limitations such as ordering guarantees or state requirements) you should allocate more than you think you need. A large number of actors can share a small thread pool easily. Because of that quasi-batching behavior I mentioned earlier, you don’t have to worry about things like thrashing between threads. Experimentation is, of course, the only way to know for sure, but usually if you plot the throughput of the system vs the number of worker actors, you’ll see a curve where the throughtput goes up as you add actors until you saturate your threads, but then afterwards it is pretty flat. i.e. if 40 actors is the optimal number of worker actors, 4000 actors is still going to give you nearly the exact same performance as 40: there usually isn’t a penalty to overallocating.

3 Likes