[Akka 2.6.0-M4] Can not reproduce documented behaviour for "Remote Watch Disabled" changes

skvithalani · July 9, 2019, 6:41am

While migrating to 2.6.0-M4 we had a few tests failing related to remote-watch changes in the recent milestone.

In the process of reproducing the issue with minimal code, we realized that the change is not even manifested as expected.

We first start RemoteWatcheeApp followed by RemoteWatcherApp as described in the readme of minimized-issue-repo.

We were expecting that Terminated signal will not be received by the watcher. But it does receive it!

What are we missing?

helena · July 10, 2019, 5:44pm

Hi, thanks for your question. First, the behavior you are checking is specifically for when one is not using Akka Cluster, yet this is the first dependency you have added to your build. It is however designed to be overridden if Akka Cluster is used because in that case, the remote watch/unwatch is safe, and Terminated messages for killed actors would indeed be received.

That said, you are not enabling Cluster and I do see an error in the docs because in remoting we catch the Watch/Unwatch for a DeathWatch, and do allow all other system messages through, like Terminated:

Remote Watch: ignores the watch and unwatch request, and Terminated will not be delivered when the remote actor is stopped or if a remote node crashes

So you are correct there and thank you for identifying it for us to update!

skvithalani · July 11, 2019, 4:56am

@helena , thanks for explanation!

If I understand it correctly: remote actor-ref watch and Terminated signals will work even in 2.6 as long as both (watcher and watchee) actor systems use the remote provider.

With this understanding, we further modified our example such that a) Watcher has Cluster Provider b) Watchee has Remote Provider with use-unsafe-remote-features-without-cluster = on.

Now, we should receive Terminated signal as expected, right? But we do not! This is breaks few of our tests when we try to migrate to M4

What explains this behaviour?

helena · July 11, 2019, 12:51pm

Hi @skvithalani, actually in looking into this, we found something and I’m pushing a fix, so thank you very much for finding this!! I will update you here today, with a link for the function change if you’re curious. It will be in the next milestone

patriknw · July 11, 2019, 2:49pm

Only if they enable use-unsafe-remote-features-without-cluster

Yes, if use-unsafe-remote-features-without-cluster is enabled.

This didn’t work correctly in M4. You will be able to try a nightly snapshot when Helena’s fix has been merged.

Thanks a lot for trying out the milestones and reporting this issue.

skvithalani · July 11, 2019, 3:43pm

@helena and @patriknw,

Thanks for the update.

We look forward to the next milestone getting released with the fix.

helena · July 12, 2019, 8:18pm

Hi @skvithalani it looks like the updated snapshots are now published, I believe what you want is in:
https://repo.akka.io/snapshots/com/typesafe/akka/akka-cluster-typed_2.12/2.6-20190712-192311/

Thanks again for bringing this to our attention.
Kind regards,
Helena

skvithalani · July 13, 2019, 6:49am

Thanks for the update @helena .

skvithalani · July 17, 2019, 4:44am

@helena and @patriknw

We tested the snapshot 2.6-20190712-192311 in our experiment repo for
a) Watcher with cluster provider and
b) Watchee with remote provider.

Our observation:

Watcher(with cluster provider) requires akka.remote.use-unsafe-remote-features-without-cluster = on and it will receive Terminated signal from watchee (with remote provider) as expected

This solves the remote watching problem.

Our confusion:

The watcher already has a cluster provider so why does it require the flag akka.remote.use-unsafe-remote-features-without-cluster?

Instead, watchee which has remote provider should require the flag, isn’t it?

helena · July 17, 2019, 4:06pm

So this is about ‘safe’ use of remoting and in your example you want to knowingly do unsafe, which is across the cluster boundary if your watchee is outside the cluster. Hence you need to declare it. Does that help? You should not need to if watcher/watchee are inside the same cluster. You would need to if both were remote only.

Helena

skvithalani · July 18, 2019, 6:16am

Here is an another attempt to explain. In the shared example watcher is a cluster as shown by the following config:

  akka {
    actor {
      provider = "cluster"
    }

    remote {
      artery {
        canonical.port = 4567
      }
      use-unsafe-remote-features-without-cluster = on
    }
  }

This configuration is a bit confusing for the user.

use-unsafe-remote-features-without-cluster = on setting is required even though provider = cluster
We say provider = cluster but setting ends with -without-cluster=on

This makes it clear that new setting is required for all the watchers whether or not it is remote. Either documentation needs to reflect this or the current implementation has unexpected effect.

Hope this clarifies the confusion.

patriknw · July 18, 2019, 7:03am

The background of the feature/limitation is that we want to make users very aware of that watch to a node outside of the cluster may have unexpected consequences, such as quarantining and therefor required restart as soon as the failure detector timeout triggers.

Failure detection between nodes that are members of the same cluster doesn’t have that shortcoming.

Typically this is when using plain remoting without any cluster provider at all. As you mention it can also be when using cluster provider but watching a node that is not a member, but I think such mixed usage is more rare.

We could consider renaming the config to -outside-cluster instead of -without-cluster

skvithalani · July 18, 2019, 9:10am

That will be really helpful. There is another key point here: we should document that all watchers need this setting (cluster as well as remote).

I agree that cluster having a need to watch remotes is rare. But in our case, that is the core of our design for a general purpose Akka-CRDT based service-discovery mechanism where some of the registered-services are remote actors.

helena · July 18, 2019, 2:45pm

Yes, even though you use provider = cluster, you are watching across the cluster boundary, which is where you should note:

if you understand the consequences

But I think I will create a ticket to describe the cross-boundary use for clarity, thanks for bringing it up.

~Helena

helena · July 18, 2019, 2:48pm

skvithalani · July 18, 2019, 3:43pm

You are welcome @helena and thanks for updating us with the issue created.

Looking forward to next 2.6 milestone.

skvithalani · July 19, 2019, 5:54am

@patriknw since there is already an issue created for improving docs ( Issue: Add clarification to doc on cluster cross boundary use of DeathWatch) should I go ahead and create another issue for renaming the flag use-unsafe-remote-features-without-cluster to
use-unsafe-remote-features-outside-cluster ?

patriknw · July 19, 2019, 8:44am

I don’t think we need an additional issue but you can make a comment on that doc ticket. Thanks.

skvithalani · July 19, 2019, 8:55am

okay sure.

Topic		Replies	Views
Not receiving terminated event from remote Actor Akka Cluster	5	941	September 21, 2020
Remote death watch apparently not working correctly Akka Libraries akka , akka-cluster	1	853	March 17, 2019
Death watching across clusters Akka Cluster	3	640	April 6, 2020
Actor watch in Akka cluster Akka Cluster	2	860	April 23, 2018
Cluster vs Remote failure detector Akka Cluster	2	700	May 20, 2018

[Akka 2.6.0-M4] Can not reproduce documented behaviour for "Remote Watch Disabled" changes

Related topics