What is the purpose of Cluster Sharding I have went through some guides but I didn’t get a clear picture on what it is.
What is the best usecase to use cluster sharding ?
Here,
I have a RunnerCluster
“akka.tcp://RunnerCluster@172.24.117.1:3001”,
“akka.tcp://RunnerCluster@172.24.121.75:3001”,
“akka.tcp://RunnerCluster@172.24.123.240:3001”
And I have a ClientCluster client which running on separate machine.
When I send start message from client to runnerCluster then anyone of the node in the RunnerCluster will receive the message.
After receiving this, my business logic is,
if (myCondition.equals(sleep))
create a SleepActor on 172.24.117.1:3001
else if ((myCondition.equals(weight))
create a WeightActor on 172.24.121.75:3001
else
create a FuncActor on 172.24.123.240:3001
How will I achieve the above usecase ?
Here I want to send mesaage to a specific remote machine if particular condition met. So remote machine means we need to know the IP address and port of the app running.
Cluster sharding says that by using a logical keys (in form of ShardId/EntityId pairs) it’s able to route our messages to the right entity without need to explicitly determine on which node in the cluster it lives.
So my question here is,
How will they know whom to send a message without the knowledge of IP and port ?
How routing works in cluster sharding ?
What is the role of ShardId, and EntityId?
By using this how will we route the message to specific remote machine ?
I have red so many guides but did’t have clear knowledge on this So please advice me on this…
When using Cluster Sharding you will not be in charge of where the entities (actors) are running. That is taken care of by Cluster Sharding. For a given entityId there will only be one single entity for that entityId. A message for a given entityId will be routed to the entity no matter from which node it is sent.
Shard is a group of entities, mosly for scalability purposes, to be able to manage millions of entities in more course grained groups.
You essentially define that yourself. One of the functions you define is extractEntityId which is how the the entityid is extracted from a message. Typically this some kind of key, but it can be determined however you want to determine it.
The below code will call receive method in EntityActor class and match with message.
My doubt here is out of three node in a cluster which one will pick this message
From the message the shardId is defined via the extractShardId function. The location of a shards is decided by ClusterSharding automatically. By default a shard is allocated on first access to the node with least shards. There is also a re-balance process to move shards so that new nodes can share the load.
If you really want to be in charge of where to allocate the shards the allocation strategy is pluggable so you can have your own algorithm. That is mentioned in the documentation.
I don’t know how take control on shard allocation. By default Shard coordinator does the shard allocation. If I want to take control overt there then where should I have my own algorithm or which one should I override.
I didn’t find anything regarding this in the documentation
Creating a good sharding algorithm is an interesting challenge in itself. Try to produce a uniform distribution, i.e. same amount of entities in each shard. As a rule of thumb, the number of shards should be a factor ten greater than the planned maximum number of cluster nodes. Less shards than number of nodes will result in that some nodes will not host any shards. Too many shards will result in less efficient management of the shards, e.g. rebalancing overhead, and increased latency because the coordinator is involved in the routing of the first message for each shard. The sharding algorithm must be the same on all nodes in a running cluster. It can be changed after stopping all nodes in the cluster.