Is implementing a peer-to-peer network of Akka nodes that can come and go possible?

jonnydee · May 29, 2020, 2:34pm

Hi,

we are trying to implement a system where a set of Akka system nodes form a peer-to-peer network. The maximum number of nodes as well as their addresses are known by each node in advance. All nodes can come and go whenever they want. Note that I don’t use Akka Cluster but Artery Remoting only.

Each node has a dedicated NodeObserver actor which periodically checks if other nodes are reachable with the help of an ActorSelection having a path to a node’s NodeObserver actor. Once a NodeObserver actor detects that another one on a remote system is reachable, it starts death watching it instead. If a watched remote NodeObserver actor terminates the watching NodeObserver falls back to periodically checking for reachability using the ActorSelection approach outlined above again.

If we shut down a node or start a node all other running nodes react as expected. So this approach works great… Until a network partition is induced by disabling the network adapter, for example. In that case, the now non-reachable nodes become quarantined. And only a restart of all(!) nodes will fix this problem. From what I read, this seems to be the only way to recover from the quarantined state.

This is clearly the worst case because we chose Akka in order to get a resilient solution. And this situation is actually the contrary.

Is it possible to fiddle around with some configuration parameters and make our current implementation magically work? I played around with some configuration parameters related to the quarantining mechanism with no success. Please help, what can we do? (I hope we don’t need to go for a completely different solution because release date is close.)

patriknw · May 29, 2020, 8:57pm

If you have a good reason for not using Akka Cluster for this you shouldn’t use remote watch (nor remote deployment). That’s why we have disabled that feature by default if not using Akka Cluster from version 2.6.0.

jonnydee · May 29, 2020, 9:22pm

Thank you very much for your fast and helpful answer
Although this is bad news because we need to implement a watching mechanism ourselves I clearly know now what I shouldn*t do. Instead of wasting time finding a solution where there is none we can concentrate on an alternative approach. Thank’s a lot for that.
Maybe it will suffice to just NOT switch to Akka’s remote watching once a node becomes reachable but to continue checking for reachability using ActorSelection instead.

patriknw · May 30, 2020, 9:46am

Exactly. Implementing your own failure detection by periodically sending request-response heartbeat messages isn’t difficult. Those can be sent with actorSelection or the ActorRef that you have discovered.

Topic		Replies	Views
[Akka 2.5.x][Remoting] - Recovering from guaranteed nodes Akka Libraries	5	1839	May 23, 2018
How to avoid nodes to be quarantined in Akka Cluster? Akka Cluster akka , akka-cluster	2	3444	August 25, 2018
Actor watch in Akka cluster Akka Cluster	2	865	April 23, 2018
Quarantined node haven't joined back the cluster even after multiple restart Persistence / Event Sourcing	0	1051	September 20, 2018
Not receiving terminated event from remote Actor Akka Cluster	5	951	September 21, 2020

Is implementing a peer-to-peer network of Akka nodes that can come and go possible?

Related topics