Hello all!
Given an akka-cluster node is running in k8s cluster, we can have split brain problem when applications automatically form cluster on startup, as described in this issue: https://github.com/akka/akka-management/issues/156
I’ll recap here to not force anyone to read whole thread there.
When k8s apiservice is partitioned from some k8s-nodes (machines, VMs) hosting akka-cluster nodes then it starts them in on available k8s-nodes, but can’t kill nodes on other side of partition. In some conditions (automatic cluster start is ON, all akka-cluster nodes were split or some part of it and SBR is used) this can leave nodes running behind partition and new ones on k8s-nodes available to apiservice.
I propose extension that will probe connectivity to kubernetes apiservice (using simplified approach of akka-cluster-kubernetes-discovery) and in case of lost connectivity it terminates actor system. Alternatively it could Down self-node using Cluster API.
It will be SBR agnostic, something that one will use probably together with SBR.
Do you think it’s anything that can be used generally or problem should be solved in some other way?