How to NOT use akka.cluster.auto-down-unreachable-after

Twillouer · September 27, 2018, 7:27am

Hi,

I use Akka Cluster for two reason:

ClusterSharding
Singleton

Singleton are used for “master/slave” for services who doesn’t support a real and beautiful sharding.

When I start my cluster, everything is fine. But if I kill -9 my leader (like machine down suddenly), there is not autodiscovery of a new leader if I set akka.cluster.auto-down-unreachable-after=off
With settings like akka.cluster.auto-down-unreachable-after=10s, everything is fine, old leader is removed and a new one take the lead.

But this parameter is discouraged for production environment.
And if I try to implement my own version, I will definitively do a far less better job than you.

So, my question is: I understand the risk like documentation said (https://doc.akka.io/docs/akka/current/cluster-usage.html), but is there a way do to “better” without ringing someone in middle of a night in case of machine shutdown ?

Thanks for you time !

dmi3zkm · September 27, 2018, 11:24am

Yes, you can try Split Brain Resolver. It commercial Lightbend tool.

I’ve seen couple of open source alternatives on github.
Actually, you can try to implement your own. Check out this conference talk Scala Swarm 2017 | Niko Will: Akka cluster management and split brain resolution.

johanandren · September 29, 2018, 9:11am

In the normal case (scaling down the cluster, rolling out an upgrade etc) you should strive for nodes gracefully leaving the cluster rather than abruptly killing the machines, this is done pretty much out of the box for 2.5 using the graceful shutdown - it’s triggered by a JVM shutdown hook.

While interesting, I think Nikos talk doesn’t actually mention how it is done.

One option, which might be surprising, is to have ops monitor the cluster for unreachability and do manual decisions about if a part of the cluster should be downed on partitions. Depends a bit on what kind of infrastructure you are running on, if it is the cloud maybe less of an option.

Twillouer · October 2, 2018, 7:10pm

Thanks a lot to both of you, and nice video who give me things to think

Ops decision is not wanted for the moment, and it’s why I’m looking for an automatic option :)

Topic		Replies	Views
Cluster gets down automatically with "Shutting down myself" message Akka Libraries akka-cluster	3	2094	August 27, 2018
Erroneous split-brain situation in cluster (with properly working sbr) Akka Cluster akka-cluster	3	1515	September 26, 2018
Cluster formation unsuccessful after split-brain-resolver down all decision Akka Cluster	0	407	January 27, 2021
Downing an unreachable node via akka management doesn't solve the unreachability Akka Cluster	3	41	January 21, 2025
Cluster losing all singletons Akka Cluster akka-cluster	4	1975	April 19, 2018

How to NOT use akka.cluster.auto-down-unreachable-after

Related topics