Hi,
I was running a cluster of four nodes, one client node (port 4504) and three worker nodes (port 4501 (seed), 4502, 4503) in my testing.
I did the below steps:
- A cluster.down(…) was issued to the client node.
- The client node listens to below message
Cluster(system).registerOnMemberRemoved{
cluster.leave(cluster.selfAddress)
Does coordinated shutdown
}
The actorSytem of the client node is reset to null. - After which, the client node, upon receiving new requests, tries to reinitialize the actorSystem.
In my testing, if the client node tries to reinitialize the actor system using the same port 4504, I see this “New incarnation” message for 5 minutes.
New incarnation of existing member [Member(address = akka.tcp://abc@machineName:4504, status = Down)] is trying to join. Existing will be removed from the cluster and then new member will be allowed to join.
After 5 mins, the cluster does become healthy again.
If this client tries to reinitialize the actorSystem using a different port, it seems the cluster is restored to healthy state much quicker.
My question is why does it take 5 mins when the client node left cleanly? Below is the server log.
Thanks,
Grace
11:24:26.756 INFO [TheServer-akka.actor.default-dispatcher-4] c.m.d.i.s.c.p.f.TheClusterStateListener:41 - Cluster state: members=TreeSet(Member(address = akka.tcp://TheServer@MACHINENAME:4501, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4502, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4503, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Up)), unreachable=Set(), leader=Some(akka.tcp://TheServer@MACHINENAME:4501)
11:24:26.856 ERROR [TheServer-akka.actor.default-dispatcher-4] akka.remote.EndpointWriter:67 - AssociationError [akka.tcp://TheServer@MACHINENAME:4501] <- [akka.tcp://TheServer@MACHINENAME:4504]: Error [Shut down address: akka.tcp://TheServer@MACHINENAME:4504] [
akka.remote.ShutDownAssociation: Shut down address: akka.tcp://TheServer@MACHINENAME:4504
Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system terminated the association because it is shutting down.
]
11:24:27.916 INFO [TheServer-akka.actor.default-dispatcher-4] c.m.d.i.s.c.p.f.TheClusterStateListener:40 - Other event: SeenChanged(true,Set(akka.tcp://TheServer@MACHINENAME:4503, akka.tcp://TheServer@MACHINENAME:4504, akka.tcp://TheServer@MACHINENAME:4502, akka.tcp://TheServer@MACHINENAME:4501))
11:24:27.916 INFO [TheServer-akka.actor.default-dispatcher-4] c.m.d.i.s.c.p.f.TheClusterStateListener:41 - Cluster state: members=TreeSet(Member(address = akka.tcp://TheServer@MACHINENAME:4501, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4502, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4503, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Up)), unreachable=Set(), leader=Some(akka.tcp://TheServer@MACHINENAME:4501)
11:24:27.916 INFO [TheServer-akka.actor.default-dispatcher-4] c.m.d.i.s.c.p.f.TheClusterStateListener:40 - Other event: ReachabilityChanged()
11:24:27.916 INFO [TheServer-akka.actor.default-dispatcher-4] c.m.d.i.s.c.p.f.TheClusterStateListener:41 - Cluster state: members=TreeSet(Member(address = akka.tcp://TheServer@MACHINENAME:4501, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4502, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4503, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Up)), unreachable=Set(), leader=Some(akka.tcp://TheServer@MACHINENAME:4501)
11:24:29.572 INFO [TheServer-akka.actor.default-dispatcher-32] c.m.d.i.s.c.p.f.TheClusterStateListener:40 - Other event: ReachabilityChanged()
11:24:29.572 INFO [TheServer-akka.actor.default-dispatcher-32] c.m.d.i.s.c.p.f.TheClusterStateListener:41 - Cluster state: members=TreeSet(Member(address = akka.tcp://TheServer@MACHINENAME:4501, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4502, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4503, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Up)), unreachable=Set(), leader=Some(akka.tcp://TheServer@MACHINENAME:4501)
11:24:35.228 WARN [TheServer-akka.actor.default-dispatcher-2] a.remote.ReliableDeliverySupervisor:75 - Association with remote system [akka.tcp://TheServer@MACHINENAME:4504] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://TheServer@MACHINENAME:4504]] Caused by: [Connection refused: no further information: MACHINENAME/144.203.109.177:4504]
More logs. Removed due to spacing.
11:27:09.216 INFO [TheServer-akka.actor.default-dispatcher-3] c.m.d.i.s.c.p.f.TheClusterStateListener:41 - Cluster state: members=TreeSet(Member(address = akka.tcp://TheServer@MACHINENAME:4501, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4502, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Up)), unreachable=Set(), leader=Some(akka.tcp://TheServer@MACHINENAME:4501)
11:27:09.216 INFO [TheServer-akka.actor.default-dispatcher-3] c.m.d.i.s.c.p.f.TheClusterStateListener:40 - Other event: ReachabilityChanged()
11:27:09.216 INFO [TheServer-akka.actor.default-dispatcher-3] c.m.d.i.s.c.p.f.TheClusterStateListener:41 - Cluster state: members=TreeSet(Member(address = akka.tcp://TheServer@MACHINENAME:4501, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4502, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Up)), unreachable=Set(), leader=Some(akka.tcp://TheServer@MACHINENAME:4501)
11:27:17.147 INFO [TheServer-akka.actor.default-dispatcher-34] a.c.Cluster(akka://TheServer):80 - Cluster Node [akka.tcp://TheServer@MACHINENAME:4501] - Received InitJoin message from [Actor[akka.tcp://TheServer@MACHINENAME:4504/system/cluster/core/daemon/joinSeedNodeProcess-12#2077907629]] to [akka.tcp://TheServer@MACHINENAME:4501]
11:27:17.147 INFO [TheServer-akka.actor.default-dispatcher-34] a.c.Cluster(akka://TheServer):80 - Cluster Node [akka.tcp://TheServer@MACHINENAME:4501] - Sending InitJoinAck message from node [akka.tcp://TheServer@MACHINENAME:4501] to [Actor[akka.tcp://TheServer@MACHINENAME:4504/system/cluster/core/daemon/joinSeedNodeProcess-12#2077907629]]
11:27:17.147 INFO [TheServer-akka.actor.default-dispatcher-34] a.c.Cluster(akka://TheServer):80 - Cluster Node [akka.tcp://TheServer@MACHINENAME:4501] - New incarnation of existing member [Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Down)] is trying to join. Existing will be removed from the cluster and then new member will be allowed to join.
11:27:27.158 INFO [TheServer-akka.actor.default-dispatcher-35] a.c.Cluster(akka://TheServer):80 - Cluster Node [akka.tcp://TheServer@MACHINENAME:4501] - Received InitJoin message from [Actor[akka.tcp://TheServer@MACHINENAME:4504/system/cluster/core/daemon/joinSeedNodeProcess-13#-1480625418]] to [akka.tcp://TheServer@MACHINENAME:4501]
11:27:27.158 INFO [TheServer-akka.actor.default-dispatcher-35] a.c.Cluster(akka://TheServer):80 - Cluster Node [akka.tcp://TheServer@MACHINENAME:4501] - Sending InitJoinAck message from node [akka.tcp://TheServer@MACHINENAME:4501] to [Actor[akka.tcp://TheServer@MACHINENAME:4504/system/cluster/core/daemon/joinSeedNodeProcess-13#-1480625418]]
More messages here. Removed due to spaces.
11:29:59.154 INFO [TheServer-akka.actor.default-dispatcher-34] a.c.Cluster(akka://TheServer):80 - Cluster Node [akka.tcp://TheServer@MACHINENAME:4501] - Sending InitJoinAck message from node [akka.tcp://TheServer@MACHINENAME:4501] to [Actor[akka.tcp://TheServer@MACHINENAME:4504/system/cluster/core/daemon/joinSeedNodeProcess-27#1103656718]]
11:29:59.154 INFO [TheServer-akka.actor.default-dispatcher-34] a.c.Cluster(akka://TheServer):80 - Cluster Node [akka.tcp://TheServer@MACHINENAME:4501] - New incarnation of existing member [Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Down)] is trying to join. Existing will be removed from the cluster and then new member will be allowed to join.
11:30:09.162 INFO [TheServer-akka.actor.default-dispatcher-21] a.c.Cluster(akka://TheServer):80 - Cluster Node [akka.tcp://TheServer@MACHINENAME:4501] - Received InitJoin message from [Actor[akka.tcp://TheServer@MACHINENAME:4504/system/cluster/core/daemon/joinSeedNodeProcess-28#-1811863137]] to [akka.tcp://TheServer@MACHINENAME:4501]
11:30:09.162 INFO [TheServer-akka.actor.default-dispatcher-21] a.c.Cluster(akka://TheServer):80 - Cluster Node [akka.tcp://TheServer@MACHINENAME:4501] - Sending InitJoinAck message from node [akka.tcp://TheServer@MACHINENAME:4501] to [Actor[akka.tcp://TheServer@MACHINENAME:4504/system/cluster/core/daemon/joinSeedNodeProcess-28#-1811863137]]
11:30:09.162 INFO [TheServer-akka.actor.default-dispatcher-21] a.c.Cluster(akka://TheServer):80 - Cluster Node [akka.tcp://TheServer@MACHINENAME:4501] - New incarnation of existing member [Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Down)] is trying to join. Existing will be removed from the cluster and then new member will be allowed to join.
11:30:20.145 INFO [TheServer-akka.actor.default-dispatcher-4] a.c.Cluster(akka://TheServer):80 - Cluster Node [akka.tcp://TheServer@MACHINENAME:4501] - Received InitJoin message from [Actor[akka.tcp://TheServer@MACHINENAME:4504/system/cluster/core/daemon/joinSeedNodeProcess-29#-1763983524]] to [akka.tcp://TheServer@MACHINENAME:4501]
11:30:20.145 INFO [TheServer-akka.actor.default-dispatcher-4] a.c.Cluster(akka://TheServer):80 - Cluster Node [akka.tcp://TheServer@MACHINENAME:4501] - Sending InitJoinAck message from node [akka.tcp://TheServer@MACHINENAME:4501] to [Actor[akka.tcp://TheServer@MACHINENAME:4504/system/cluster/core/daemon/joinSeedNodeProcess-29#-1763983524]]
11:30:20.155 INFO [TheServer-akka.actor.default-dispatcher-4] a.c.Cluster(akka://TheServer):80 - Cluster Node [akka.tcp://TheServer@MACHINENAME:4501] - New incarnation of existing member [Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Down)] is trying to join. Existing will be removed from the cluster and then new member will be allowed to join.
11:30:27.949 INFO [TheServer-akka.actor.default-dispatcher-4] c.m.d.i.s.c.p.f.TheClusterUnReachableNodeRemover:82 - Member unreachable detected: Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Down)
11:30:27.949 INFO [TheServer-akka.actor.default-dispatcher-34] c.m.d.i.s.c.p.f.TheClusterStateListener:40 - Reachability event: UnreachableMember(Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Down))
11:30:27.949 INFO [TheServer-akka.actor.default-dispatcher-34] c.m.d.i.s.c.p.f.TheClusterStateListener:41 - Cluster state: members=TreeSet(Member(address = akka.tcp://TheServer@MACHINENAME:4501, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4502, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Up)), unreachable=Set(Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Down)), leader=Some(akka.tcp://TheServer@MACHINENAME:4501)
11:30:27.949 INFO [TheServer-akka.actor.default-dispatcher-21] c.m.d.i.s.c.p.f.TheClusterStateListener:40 - Other event: ReachabilityChanged(akka.tcp://TheServer@MACHINENAME:4502 -> akka.tcp://TheServer@MACHINENAME:4504: Unreachable [Unreachable] (1))
11:30:27.949 INFO [TheServer-akka.actor.default-dispatcher-21] c.m.d.i.s.c.p.f.TheClusterStateListener:41 - Cluster state: members=TreeSet(Member(address = akka.tcp://TheServer@MACHINENAME:4501, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4502, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Up)), unreachable=Set(Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Down)), leader=Some(akka.tcp://TheServer@MACHINENAME:4501)
11:30:28.199 WARN [TheServer-akka.actor.default-dispatcher-21] akka.cluster.ClusterCoreDaemon:75 - Cluster Node [akka.tcp://TheServer@MACHINENAME:4501] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Down)]. Node roles [TheSeedNode, dc-default]
11:30:28.199 INFO [TheServer-akka.actor.default-dispatcher-21] c.m.d.i.s.c.p.f.TheClusterStateListener:40 - Other event: SeenChanged(false,Set(akka.tcp://TheServer@MACHINENAME:4501))
11:30:28.199 INFO [TheServer-akka.actor.default-dispatcher-21] c.m.d.i.s.c.p.f.TheClusterStateListener:41 - Cluster state: members=TreeSet(Member(address = akka.tcp://TheServer@MACHINENAME:4501, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4502, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Up)), unreachable=Set(Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Down)), leader=Some(akka.tcp://TheServer@MACHINENAME:4501)
11:30:28.199 INFO [TheServer-akka.actor.default-dispatcher-21] c.m.d.i.s.c.p.f.TheClusterStateListener:40 - Other event: ReachabilityChanged(akka.tcp://TheServer@MACHINENAME:4501 -> akka.tcp://TheServer@MACHINENAME:4504: Unreachable [Unreachable] (1), akka.tcp://TheServer@MACHINENAME:4502 -> akka.tcp://TheServer@MACHINENAME:4504: Unreachable [Unreachable] (1))
11:30:28.199 INFO [TheServer-akka.actor.default-dispatcher-21] c.m.d.i.s.c.p.f.TheClusterStateListener:41 - Cluster state: members=TreeSet(Member(address = akka.tcp://TheServer@MACHINENAME:4501, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4502, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Up)), unreachable=Set(Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Down)), leader=Some(akka.tcp://TheServer@MACHINENAME:4501)
11:30:29.199 INFO [TheServer-akka.actor.default-dispatcher-34] c.m.d.i.s.c.p.f.TheClusterStateListener:40 - Other event: SeenChanged(true,Set(akka.tcp://TheServer@MACHINENAME:4501, akka.tcp://TheServer@MACHINENAME:4502))
11:30:29.199 INFO [TheServer-akka.actor.default-dispatcher-34] c.m.d.i.s.c.p.f.TheClusterStateListener:41 - Cluster state: members=TreeSet(Member(address = akka.tcp://TheServer@MACHINENAME:4501, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4502, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Up)), unreachable=Set(Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Down)), leader=Some(akka.tcp://TheServer@MACHINENAME:4501)
11:30:29.199 INFO [TheServer-akka.actor.default-dispatcher-34] c.m.d.i.s.c.p.f.TheClusterStateListener:40 - Other event: ReachabilityChanged(akka.tcp://TheServer@MACHINENAME:4501 -> akka.tcp://TheServer@MACHINENAME:4504: Unreachable [Unreachable] (1), akka.tcp://TheServer@MACHINENAME:4502 -> akka.tcp://TheServer@MACHINENAME:4504: Unreachable [Unreachable] (1))
11:30:29.199 INFO [TheServer-akka.actor.default-dispatcher-34] c.m.d.i.s.c.p.f.TheClusterStateListener:41 - Cluster state: members=TreeSet(Member(address = akka.tcp://TheServer@MACHINENAME:4501, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4502, status = Up), Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Up)), unreachable=Set(Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Down)), leader=Some(akka.tcp://TheServer@MACHINENAME:4501)
11:30:29.959 INFO [TheServer-akka.actor.default-dispatcher-34] c.m.d.i.s.c.p.f.TheClusterUnReachableNodeRemover:66 - downing unreachable node: Set(Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Down))
11:30:31.149 INFO [TheServer-akka.actor.default-dispatcher-34] a.c.Cluster(akka://TheServer):80 - Cluster Node [akka.tcp://TheServer@MACHINENAME:4501] - Received InitJoin message from [Actor[akka.tcp://TheServer@MACHINENAME:4504/system/cluster/core/daemon/joinSeedNodeProcess-30#-1978041379]] to [akka.tcp://TheServer@MACHINENAME:4501]
11:30:31.149 INFO [TheServer-akka.actor.default-dispatcher-34] a.c.Cluster(akka://TheServer):80 - Cluster Node [akka.tcp://TheServer@MACHINENAME:4501] - Sending InitJoinAck message from node [akka.tcp://TheServer@MACHINENAME:4501] to [Actor[akka.tcp://TheServer@MACHINENAME:4504/system/cluster/core/daemon/joinSeedNodeProcess-30#-1978041379]]
11:30:31.149 INFO [TheServer-akka.actor.default-dispatcher-34] a.c.Cluster(akka://TheServer):80 - Cluster Node [akka.tcp://TheServer@MACHINENAME:4501] - New incarnation of existing member [Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Down)] is trying to join. Existing will be removed from the cluster and then new member will be allowed to join.
11:30:39.199 INFO [TheServer-akka.actor.default-dispatcher-21] a.c.Cluster(akka://TheServer):80 - Cluster Node [akka.tcp://TheServer@MACHINENAME:4501] - Leader is removing unreachable node [akka.tcp://TheServer@MACHINENAME:4504]
11:30:39.199 INFO [TheServer-akka.actor.default-dispatcher-35] c.m.d.i.s.c.p.f.TheClusterUnReachableNodeRemover:85 - Member removed detected: Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Removed)
11:30:39.199 INFO [TheServer-akka.actor.default-dispatcher-21] c.m.d.i.s.c.p.f.TheClusterStateListener:40 - Member event: MemberRemoved(Member(address = akka.tcp://TheServer@MACHINENAME:4504, status = Removed),Down)