Self-termination for application running in Kubernetes

LGLO · October 17, 2018, 9:40am

Hello all!

Given an akka-cluster node is running in k8s cluster, we can have split brain problem when applications automatically form cluster on startup, as described in this issue: https://github.com/akka/akka-management/issues/156
I’ll recap here to not force anyone to read whole thread there.
When k8s apiservice is partitioned from some k8s-nodes (machines, VMs) hosting akka-cluster nodes then it starts them in on available k8s-nodes, but can’t kill nodes on other side of partition. In some conditions (automatic cluster start is ON, all akka-cluster nodes were split or some part of it and SBR is used) this can leave nodes running behind partition and new ones on k8s-nodes available to apiservice.

I propose extension that will probe connectivity to kubernetes apiservice (using simplified approach of akka-cluster-kubernetes-discovery) and in case of lost connectivity it terminates actor system. Alternatively it could Down self-node using Cluster API.

It will be SBR agnostic, something that one will use probably together with SBR.

Do you think it’s anything that can be used generally or problem should be solved in some other way?

chbatey · October 17, 2018, 1:25pm

I think it is worth considering. Some initial thoughts:

K8s master outage should result in deployed applications still running. This would mean the whole cluster would shut its self down
If the K8s master is on the smaller side of a partition there may not be enough resources left to re-create the cluster and the app would have remained available if the larger side didn’t down its self assuming external connectivity was still there

Topic		Replies	Views
Erroneous split-brain situation in cluster (with properly working sbr) Akka Cluster akka-cluster	3	1512	September 26, 2018
Akka on kubernetes k8s. Downing unreachable nodes Akka Libraries akka , akka-cluster , k8s	0	736	November 14, 2019
Cluster losing all singletons Akka Cluster akka-cluster	4	1974	April 19, 2018
Quarantine breaks cluster abstraction Akka Cluster	2	968	September 17, 2018
How to NOT use akka.cluster.auto-down-unreachable-after Akka Cluster	3	2926	October 2, 2018

Self-termination for application running in Kubernetes

Related topics