`majority-min-cap` default value

We in our team are currently thinking if we dare to change the default setting

akka.cluster.sharding.distributed-data.majority-min-cap = 5

Set in config https://github.com/akka/akka/blob/master/akka-cluster-sharding/src/main/resources/reference.conf#L157

Which frequently makes Akka sharding stuck on rolling update of <5 nodes cluster.

I wonder, what is the specific “bad” scenario this value should prevent? Why simple majority, e.g. 3/5, don’t work for small clusters (so majority-min-cap would be something like 2)?

majority-min-cap=5 means that it will write and read to more nodes in small clusters. For example in a cluster of 5 nodes it would use 3 for ordinary majority write/read, but with majority-min-cap=5 it would use all 5. In a cluster that is not changing it’s size majority would be enough, but this is to reduce the risk of that writes are “lost” in small clusters that are changing between the write and the read.

For example 5 nodes, N1, N2, …N5. The write goes to N1, N2 and N3. Right after that all those 3 nodes are shutdown before the information is disseminated to N4 and N5. Maybe a few more are joining in a rolling update. That could result in that the write is “lost”. With majority-min-cap=5 the write will also update N4 and N5.

It’s rather unlikely to happen because it’s disseminated to all nodes right afterwards because and it’s enough to read from one of the nodes with the latest information.

I think we choose a conservative default to be on the safe side. Reducing it to 3 would probably be enough if you have a good rolling update strategy in place, e.g. leaving the oldest until last and not shutting down too many at the same time.

What do you mean by stuck? Stuck as in that it never recovers, or just takes some extra time? It should never be completely stuck because of this. If that is what you see you are missing something, such as a proper downing strategy, or there is a bug in Akka. If it’s the latter it would be good if you can create an issue, share logs and assist in investigation.

Thanks for the elaborate reply!

Yep It’s not getting stuck completely , just we observe this for some time:

...Trying to register to coordinator at [ActorSelection[Anchor(...), Path(...)]], but no acknowledgement. Total [92] buffered messages. [Coordinator [Member(address = ..., dataCenter =..., status = Up)] is reachable.]

After few minutes it gets back to normal, I guess when the “register shard region” write is disseminated to at least majority-min-cap nodes. So it’s about having rolling upgrade with no availability gaps rather than some show stopper deadlock.

I like your detailization of lost write, probably leaving the oldest until last is a great idea indeed (and we already start new nodes before we shut down the olds). Thank you!

Patrik, just one more thing to clarify, we want to keep the oldest node because it probably hosts the shard coordinator right?

Minutes, that is alarming. Is the graceful leaving successful or does it fall back to unreachable and downing?

Yes, coordinator is a cluster singleton and running on oldest node. Good to only have one coordinator hand over instead of several.