Deploying shard coordinator on nodes with different role than nodes hosting shards

Hi!

We’ve got a situation where the shard coordinator should be deployed on different nodes than the shards. The reason for this type of deployment is related to latency during rolling deployments. The deployment is done on AWS Fargate which has at most 4 vCPUs per node, and during a rolling deployment the communication overhead of shards moving and the coordinator jumping from one node to the next negatively affects request latencies (what’s more - Fargate schedules updates in the wrong order, from oldest to youngest, so the coordinator keeps on jumping).

The target deployment is to use 3 nodes as possible hosts for the shard coordinator and 3 nodes for hosting shards. Then the nodes hosting shards can be rolled without the coordinator needing to move.

We have gotten this to work now via a shard allocation strategy that doesn’t allocate shards on nodes marked with the coordinator role as well as by running incantations to work around ClusterSharding always assigning the role for shards to the coordinator (don’t ask how, it isn’t pretty).

The main issue, as we see it, is the following line in ClusterSharding:

val singletonSettings = settings.coordinatorSingletonSettings.withSingletonName("singleton").withRole(role)

Without the withRole this type of deployment would be possible using only a custom shard allocation strategy. Is there any reason why this isn’t allowed?

Thanks!

Manuel

1 Like

I think we just haven’t seen a use case for having a separate role for the coordinator from the shards before. Could perhaps be an interesting feature if it doesn’t complicate things too much.

It would be a similar setup based on https://github.com/petabridge/lighthouse.

Where, for performance reasons in a cluster where nodes come and go, you don’t want to have the performance penalty when the node with the Coordinator is shut-down.