I am in the process of updating one of our clusters from akka-2.5 to akka-2.6, including updating from Netty to Artery. I am getting the following warning when the cluster is starting up and so far have been unable to determine what’s causing it or what the full implications are, and consequently am unsure how to properly fix the issue.
akkaAddress: akka://{system}@{host-a}:{port-a}
level: WARN
logger: akka.cluster.ClusterActorRefProvider
message: Error while resolving ActorRef [akka.tcp://{system}@{host-b}:{port-b}/system/sharding/{shard}#{id}] due to [Wrong protocol of [akka.tcp://{system}@{host-b}:{port-b}/system/sharding/{shard}], expected [akka]]
The log appears multiple times for each member of the cluster.
My debugging so far:
In this log, host-a
is the source logging the message, and host-b
is the destination; both are IPv4 addresses. Sometimes host-b
is the same as host-a
, so this warning appears for both local and remote destinations. Curiously, even when host-a
and host-b
are the same, port-a
and port-b
can be different, and in such cases port-b
doesn’t seem to correlate with any port we have configured.
The expected [akka]
portion tells me the ArteryTcpTransport
is correctly in use on host-a
. There is a similar warning from host-b
which tells me the same. The akka.tcp://
portion suggests that somehow the resolution attempts using classic remoting via an AkkaProtocolTransport
wrapping a NettyTransport
.
Looking at the source for ClusterActorRefProvider
, I see the Error while resolving ActorRef
can come from a few places in the RemoteActorRefProvider
superclass while constructing a RemoteActorRef
. The part that results in akka.tcp
in the error message comes from a localAddress
, and all but one of those comes from an invocation of transport.localAddressForRemote
. It’s already established that transport
is correctly an ArteryTcpTransport
, and its localAddressForRemote
always results in setting the akka
protocol. So the erroneous path must be going through the remaining location where the localAddress
is passed in: RemoteActorRefProvider.resolveActorRefWithLocalAddress
This method only appears related to classic remoting by going through AkkaPduProtobufCodec
, through which I’ve gotten as far as some code in Remoting.listens
that dynamically creates legacy Transport
instances. I then found that akka.remote.classic.enabled-transports = ["akka.remote.classic.netty.tcp"]
by default. I tried setting akka.remote.classic.enabled-transports = []
yet the warning persists.
I thought maybe an akka-2.5 artifact was being loaded, but there is no message warning about mixed versions, and no akka-2.5 dependency anywhere in the tree.
We have another cluster for which we have already made this update where the warning does not appear. There are two main differences between them:
- this one uses
akka-persistence-jdbc=4.0.0
while the other does not useakka-persistence
at all, and theshard
is also aPersistentActor
- this one uses
akka.remote.artery.large-message-destinations
for theshard
while the other does not use any large message destinations
I tried not including the shard
as a large message destination but the warning persists.
This warning does not prevent a cluster from forming. However because this warning appears for the cluster shard and the code where it appears returns an EmptyLocalActorRef
instead of the RemoteActorRef
, I want to make sure messages sent to the shard are correctly being sent to a singleton instance within the cluster and not to a local instance, as we rely on the singleton aspect for correctness. I believe this is the case but I’ve not been able to follow usage of the EmptyLocalActorRef
to validate that belief.
So my questions are
- What could cause akka-2.6 to use classic remoting at all while artery is enabled?
- What are the effects and implications of seeing this message for a cluster shard on startup?
- What can I do about it?