Hi, thanks for your question. First, the behavior you are checking is specifically for when one is not using Akka Cluster, yet this is the first dependency you have added to your build. It is however designed to be overridden if Akka Cluster is used because in that case, the remote watch/unwatch is safe, and Terminated messages for killed actors would indeed be received.
That said, you are not enabling Cluster and I do see an error in the docs because in remoting we catch the Watch/Unwatch for a DeathWatch, and do allow all other system messages through, like Terminated:
Remote Watch: ignores the watch and unwatch request, and Terminated will not be delivered when the remote actor is stopped or if a remote node crashes
So you are correct there and thank you for identifying it for us to update!
If I understand it correctly: remote actor-ref watch and Terminated signals will work even in 2.6 as long as both (watcher and watchee) actor systems use the remote provider.
With this understanding, we further modified our example such that a) Watcher has Cluster Provider b) Watchee has Remote Provider with use-unsafe-remote-features-without-cluster = on.
Now, we should receive Terminated signal as expected, right? But we do not! This is breaks few of our tests when we try to migrate to M4
Hi @skvithalani, actually in looking into this, we found something and I’m pushing a fix, so thank you very much for finding this!! I will update you here today, with a link for the function change if you’re curious. It will be in the next milestone
We tested the snapshot 2.6-20190712-192311 in our experiment repo for
a) Watcher with cluster provider and
b) Watchee with remote provider.
Our observation:
Watcher(with cluster provider) requires akka.remote.use-unsafe-remote-features-without-cluster = on and it will receive Terminated signal from watchee (with remote provider) as expected
This solves the remote watching problem.
Our confusion:
The watcher already has a cluster provider so why does it require the flag akka.remote.use-unsafe-remote-features-without-cluster?
Instead, watchee which has remote provider should require the flag, isn’t it?
So this is about ‘safe’ use of remoting and in your example you want to knowingly do unsafe, which is across the cluster boundary if your watchee is outside the cluster. Hence you need to declare it. Does that help? You should not need to if watcher/watchee are inside the same cluster. You would need to if both were remote only.
This configuration is a bit confusing for the user.
use-unsafe-remote-features-without-cluster = on setting is required even though provider = cluster
We say provider = cluster but setting ends with -without-cluster=on
This makes it clear that new setting is required for all the watchers whether or not it is remote. Either documentation needs to reflect this or the current implementation has unexpected effect.
The background of the feature/limitation is that we want to make users very aware of that watch to a node outside of the cluster may have unexpected consequences, such as quarantining and therefor required restart as soon as the failure detector timeout triggers.
Failure detection between nodes that are members of the same cluster doesn’t have that shortcoming.
Typically this is when using plain remoting without any cluster provider at all. As you mention it can also be when using cluster provider but watching a node that is not a member, but I think such mixed usage is more rare.
We could consider renaming the config to -outside-cluster instead of -without-cluster
That will be really helpful. There is another key point here: we should document that all watchers need this setting (cluster as well as remote).
I agree that cluster having a need to watch remotes is rare. But in our case, that is the core of our design for a general purpose Akka-CRDT based service-discovery mechanism where some of the registered-services are remote actors.