Proxy try to register to itselves, so ClusterSharding does not work

i.na · March 23, 2020, 11:15am

Hello,
I’m using akka cluster sharding feature.
I have two nodes. one is worker node and the other one is proxy node.

Configuration and the code of starting node are as follows.
For proxy

ClusterSharding.get(actorSystem)
            .startProxy(
                "MyActor",
                Optional.empty(),
                new myShardMessageExtractor()
            );
			
akka.cluster {
    roles = ["MyProxy"]
    sharding {
        role = "MyManager"
    }
}

For worker

ClusterSharding.get(actorSystem)
            .start(
                myActor.SHARD,
                Props.create(GuiceInjectedActor.class, injector, myActor.class),
                ClusterShardingSettings.create(actorSystem),
                MyActor.shardExtractor(),
                new ShardCoordinator.LeastShardAllocationStrategy(1, 3),
                new StopActor()
            );
			
akka.cluster {
    roles = ["MyManager"]
    sharding {
        role = "MyManager"
    }
}

I usually start worker first and proxy later.
But when I try to replace worker node, my service looks not work properly.

This is what I did.

There are 1 proxy and 1 worker.
Depoy new worker node. There are 1 proxy and 2 worker at that time.
It works good.
Destroy old worker.
There are 1 proxy and worker node. and there is no unreachable node.

But my service didn’t work. It looks something was wrong.

I found below log as soon as destroy old worker node.

logger=EmptyLocalActorRef, tn=myActor-akka.actor.default-dispatcher-13, message=Message [akka.cluster.sharding.ShardCoordinator$Internal$RegisterProxy
] from Actor[akka://myActor/system/sharding/myProxy#-1187429952] to Actor[akka://myActor/system/sharding/myCoordinator/singleton/coordinator] was not delivered. [9] dead letters encountered. If this is not an expected behavior t
hen Actor[akka://myActor/system/sharding/myCoordinator/singleton/coordinator] may have terminated unexpectedly. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-lette
rs-during-shutdown'.,

And then proxy node looks that it try to register coordinator itselves even new worker node is exist.

logger=ShardRegion, tn=myActor-akka.actor.internal-dispatcher-4, message=my: Trying to register to coordinator at [ActorSelection[Anchor(akka://je
judo/), Path(/system/sharding/myCoordinator/singleton/coordinator)]], but no acknowledgement. Total [1] buffered messages. [Coordinator [Member(address = akka://myActor@10.26.93.192:2552, status = Up)] is reachable.],

If I want to make my service healthy, I have to restart proxy node.
Is there a safe way to replace worker node?

johanandren · March 25, 2020, 12:30pm

Note that if you want to only run actual shards on a subset of the nodes you will need to use roles for those nodes and tell sharding that it should be limited to those nodes. If not using roles sharding will have to be started on all nodes to work.

Look for “roles” in this docs page for details https://doc.akka.io/docs/akka/current/cluster-sharding.html

i.na · March 26, 2020, 1:04am

Thanks, let me look into role of cluster-sharding more.

And I found some log in worker/proxy node for this case.

Shard(46) of terminating worker node tried to be rebalanced on new worker node.
but update state timeout occured.

timestamp=08:02:48.424, level=INFO , logger=DDataShardCoordinator, message=ShardCoordinator was moved to the active state State(Map(46 -> Actor[akka://myActor@10.26.146.136:2552/system/sharding/My#-188424463])),
timestamp=08:02:48.425, level=INFO , logger=DDataShardCoordinator, message=Starting rebalance for shards [46]. Current shards rebalancing: [],
timestamp=08:02:50.463, level=ERROR, loggingId=, logger=DDataShardCoordinator, tn=myActor-akka.actor.default-dispatcher-17, message=The ShardCoordinator was unable to update a distributed state within 'updating-state-timeout': 2000 millis (retrying). Perhaps the ShardRegion has not started on all active nodes yet? event=ShardHomeDeallocated(46),

After that, RecieveTimeout message went to dead-letters.

imestamp=08:03:08.441, level=INFO , logger=LocalActorRef, message=Message [akka.actor.ReceiveTimeout$] from Actor[akka://myActor/system/sharding/MyCoordinator/singleton/coordinator/$a#1543599307] to Actor[akka://myActor/system/sharding/MyCoordinator/singleton/coordinator/$a#1543599307] was not delivered. [4] dead letters encountered. If this is not an expected behavior then Actor[akka://myActor/system/sharding/MyCoordinator/singleton/coordinator/$a#1543599307] may have terminated unexpectedly. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.,

Proxy node tried to send message to shard(46) but it failes because coordinator is none.

timestamp=08:04:21.360, level=DEBUG, logger=ShardRegion, message=My: Request shard [46] home. Coordinator [None],
timestamp=08:04:22.386, level=WARN ,logger=ShardRegion, tmessage=My: Trying to register to coordinator at [ActorSelection[Anchor(akka://myActor/), Path(/system/sharding/MyCoordinator/singleton/coordinator)]], but no acknowledgement. Total [1] buffered messages. [Coordinator [Member(address = akka://myActor@10.26.31.44:2552, status = Up)] is reachable.],

It looks that ShardCoordinator cannot update state when ReceiveTimeout message goes to dead letter. Is it expected behavior? Will it be solved if I use role of cluster sharding?

Thanks.

johanandren · March 26, 2020, 9:41am

Yes likely, when the shard figures out where the “shard coordinator” is expected to live it will use the full set of nodes unless you use roles, and as you have not started sharding on both nodes that will not work if it expects the node without sharding to have the coordinator.

i.na · March 27, 2020, 6:30am

I change code for starting proxy. and It works.

ClusterSharding.get(actorSystem)
            .startProxy(
                "MyActor",
                Optional.of("MyManager"),
                new myShardMessageExtractor()
            );

Even receiveTimeout message goes to dead letter, shard(46) is started successfully on new worker node.
As I look into startProxy method, it did not get config value of akka.cluster.sharding.role when starting proxy…

Thanks @johanandren

Topic		Replies	Views
Register coordinator fail when I use proxy-only-mode for cluster sharding Akka Cluster	2	707	December 23, 2019
How to use Akka cluster sharding Proxy and ddata together Akka Libraries akka-cluster	0	464	June 22, 2022
Rebalance not happening when using Akka cluster Sharding with proxyMode Akka Libraries	0	423	June 22, 2022
Not able to start the Shard coordinator Akka Libraries akka-cluster	6	2153	June 1, 2018
Akka Cluster Shards are getting placed in a single node Akka Cluster	2	1178	March 31, 2018

Proxy try to register to itselves, so ClusterSharding does not work

Related topics