Hi!
We’ve got a situation where the shard coordinator should be deployed on different nodes than the shards. The reason for this type of deployment is related to latency during rolling deployments. The deployment is done on AWS Fargate which has at most 4 vCPUs per node, and during a rolling deployment the communication overhead of shards moving and the coordinator jumping from one node to the next negatively affects request latencies (what’s more - Fargate schedules updates in the wrong order, from oldest to youngest, so the coordinator keeps on jumping).
The target deployment is to use 3 nodes as possible hosts for the shard coordinator and 3 nodes for hosting shards. Then the nodes hosting shards can be rolled without the coordinator needing to move.
We have gotten this to work now via a shard allocation strategy that doesn’t allocate shards on nodes marked with the coordinator role as well as by running incantations to work around ClusterSharding
always assigning the role for shards to the coordinator (don’t ask how, it isn’t pretty).
The main issue, as we see it, is the following line in ClusterSharding
:
val singletonSettings = settings.coordinatorSingletonSettings.withSingletonName("singleton").withRole(role)
Without the withRole
this type of deployment would be possible using only a custom shard allocation strategy. Is there any reason why this isn’t allowed?
Thanks!
Manuel