imagine in a hypothetical world i manage to run an akka cluster of 50k nodes, what is the probability of having a split brain knowing they are all running in 1 physical node on the same network?
set the minimum cluster size to 49,999? ;-P
it sounds pretty likely to me. I mean, just with that many cluster members, there’d have to some likelihood that the communication overhead trying to discover the seed nodes and elect a leader and what not would result in at least some of these nodes being isolated from the others, even if only for a brief moment.
but I’m just speculating. maybe the lightbend guys know of instances where customers did exactly this and didn’t even need their split-brain resolver!?
It’s very hypothetical, as you say. We rarely come in touch with clusters of more than a few 100 nodes. Low 1000s should be possible but we haven’t even tried designing for more.
Also very theoretical because you talk about running 50,000 nodes on a single physical node. Which is going to have potential resource issues: e.g. you could easily run out of ports/file descriptors/etc.
Also you’d have to define exactly what you mean by split brain. A true split brain where you have multiple leaders? I’d guess unlikely, unless you do something stupid like turn on auto-downing. An unstable cluster where consensus can’t be reached or a partitioned cluster ? It would be an interesting experiment, but these seem likely.