How does the shard coordinator (in distributed data mode) guarantee that an entity runs on at most one node at a time, given that distributed data mode is only eventually consistent?
The documentation mentions that the coordinator runs with Write Majority/Read Majority consistency. But I can’t figure out how that helps in partial failure scenarios. Here’s one example:
- Coordinator C1 starts to allocate a new shard S to shard region SR1, writes the mapping to a minority of nodes, and dies.
- Coordinator C2 starts up, reads from some nodes, one of those nodes has the mapping of S to SR1.
- C2 tells SR1 about the mapping for S. This starts S on that region.
- C2 dies.
- Coordinator C3 starts up, reads from some nodes, none of them have the mapping of S to SR1 (because it was only written on a minority of nodes).
- C3 allocates S on a different shard region SR2.
What prevents this from happening? More generally, is the a place to read more algorithmic details about the shard coordinator in distributed data mode, beyond just reading the code?