We have the following situation:
- We have a cluster of 3 nodes
- The cluster is receiving numerous GET requests to get the information of an entity
- A rolling restart starts
- The 2 new nodes are spawn (5 nodes in the cluster)
- When those 2 new nodes are healthy (based on AkkaManagementHttp), 2 oldest nodes are downed with coordinated-shutdown
- 1 new node is spawned (4 nodes in the cluster)
- Latest old node is shut down.
What we see is that during this rolling restart, we see a bunch of GET requests having 4+ seconds latency (compared to the usual 30ms).
- We know that shutting down the oldest node is not ideal due to the Singleton handovers, but this cannot be configured atm.
- We are using akka 2.5.19
Is this delay caused by the ShardCoordinator handover? How can this latency increase be prevented?