ShardRegion bottleneck

Eric · May 7, 2018, 9:34am

Hi all,

I am a bit concerned that the ShardRegion actor may be a bottleneck in an application I am building. From what I understand, all messages to a specific shard region passes through the ShardRegion actor before forwarding the message to the appropriate shard and finally delivered to the entity (ShardRegion -> Shard -> Entity). Are messages to the ShardRegion serialized (consumed sequentially)? Or is the ShardRegion actor a “special” actor like a router? If the former is true, it seems like this would be a huge limit to scalability if there are hundreds or thousands of entities under a ShardRegion actor. How should shards be built to avoid the sequential bottleneck in that case?

TimMoore · May 7, 2018, 10:21am

There is another thread on this topic here What are the possible bottlenecks of using Sharding?

Eric · May 7, 2018, 10:31am

Hi, unfortunately, that thread doesn’t really give me the answer I’m looking for. That thread discusses the ShardCoordinator. I understand that the ShardCoordinator is used to look up the location of a shard if it is unresolved, otherwise it is bypassed.

What I need to know is how messages are passed through the ShardRegion actor before being delivered to a specific entity. Are messages that arrive at a ShardRegion actor consumed sequentially?

patriknw · May 8, 2018, 5:18pm

The shard region doesn’t do much for each message so it should be able to handle a high throughput.

It’s actually two actors there. First the ShardRegion actor which extracts the shardId from the message and delegates to the Shard actor. This is pretty much like a HashMap lookup. Then the Shard actor delegates to the entity actor by another lookup.

What is your throughput requirements and have you measured that the bottleneck for that is in the sharding actors and not in your entity actors?

Eric · May 11, 2018, 8:56am

Yes, the ShardRegion and Shard actors may not do much more than some lookups. Just wondering if messages to these actors are consumed sequentially (i.e. they are “real” actors, not pseudo-actors like routers). If so, throughput will always be limited, regardless of actual throughput requirements. This is for a thesis project in parallel and concurrent programming so it’s more of an academic exercise of pointing out any limits to scalability in the system.

justinpeel · May 11, 2018, 4:04pm

Yes, Shard and ShardRegions have their own actors. You can increase the throughput parameter on the dispatcher for those actors so that more messages are processed at a time from these actors (and any actor associated with the dispatcher) before switching to a different actor. You can also increase the number of shard regions and instances of your app, but there are trade-offs with doing that.

Also, if you are using remember entities, there are some possible potential performance issues if you have a lot of different entities. These performance issues would happen during the initial start-up, during a restart, or during a rebalance. I talked about these issues in a different topic, but no one has replied yet.

Eric · May 12, 2018, 9:34am

Yes, adjusting the throughput parameter helps avoid unnecessary context switching, and may contribute to better throughput due to prefetching and caching (guess it depends on the underlying implementation of the queue as well as the architecture).

Ultimately, we don’t really care in which shard the entities reside, we just want to send messages to them. Extract the entityId first and if unresolved, extract the shardId second. If the shard is is unresolved, get a ref to the shard via the shard coordinator. If the shard is resolved, ask the shard for a ref to the entity.

This would allow senders to bypass the Shard actor if an entity is resolved.

patriknw · May 13, 2018, 7:25am

Thanks for clarifying that it’s a theoretical question.

If it would be a problem you could define more shards (less entities per shard) and scale to more nodes.

You could also use more entity types, and thereby more region actors. Before sending the client/sender would decide which region actor to use for the specific target entity.

Topic		Replies	Views
What are the possible bottlenecks of using Sharding? Akka Cluster akka-cluster	8	3529	April 26, 2020
Akka Cluster slowness when multiple threads sends messages to entity actor from a ShardRegion Proxy Akka Libraries akka-cluster	1	525	September 13, 2022
Streams to ClusterSharding Sink - actorRefWithAck Akka Cluster	0	692	October 26, 2018
Actor slowness in Akka Cluster when multiple threads send messages to ShardRegion with large Message size Akka Cluster akka , akka-typed , akka-cluster	5	841	September 19, 2022
Sharding Does Not Processing Message Akka Libraries akka-typed , java , akka-cluster	0	510	July 9, 2023

ShardRegion bottleneck

Related topics