Our infrastructure runs several akka clusters in docker using consul for cluster state. Previously, we ran using consul 0.5.2.
We recently tried to start up a new cluster using consul 1.0.2 for the server nodes which hold the akka cluster state, and have been experiencing intermittent failures with akka nodes being unable to cluster.
For a while, our cluster was stuck at 1 node, and after many retries, and cluster destroy/rebuilds, at some point we were able to get it up to 2 nodes. We usually run 5 nodes, but any time we tried to start additional akka nodes they would fail to join the cluster, producing these errors:
*********** host: NEW_NODE_IP
*********** constructr.coordination.host: consul.internal-v7
*********** constructr.coordination.port: 8500
18:48:21 [INFO ] [Slf4jLogger ] - Slf4jLogger started
18:48:22 [INFO ] [Remoting ] - Starting remoting
18:48:22 [INFO ] [Remoting ] - Remoting started; listening on addresses :[akka.tcp://worker-cluster@NEW_NODE_IP:2553]
18:48:22 [INFO ] [Cluster(akka://worker-cluster)] - Cluster Node [akka.tcp://worker-cluster@NEW_NODE_IP:2553] - Starting up...
18:48:22 [INFO ] [Cluster(akka://worker-cluster)] - Cluster Node [akka.tcp://worker-cluster@NEW_NODE_IP:2553] - Registered cluster JMX MBean [akka:type=Cluster]
18:48:22 [INFO ] [Cluster(akka://worker-cluster)] - Cluster Node [akka.tcp://worker-cluster@NEW_NODE_IP:2553] - Started up successfully
18:48:22 [INFO ] [Constructr ] - Creating constructr-machine, because no seed-nodes defined
18:48:22 [INFO ] [CodaHaleBackend ] - Reporter com.lightbend.cinnamon.chmetrics.statsd.StatsDReporter started.
18:48:22 [INFO ] [CodaHaleBackend ] - Reporter com.lightbend.cinnamon.chmetrics.reporter.provided.JmxReporter started.
18:48:22 [INFO ] [Cluster(akka://worker-cluster)] - Cluster Node [akka.tcp://worker-cluster@NEW_NODE_IP:2553] - No seed-nodes configured, manual cluster join required
18:48:24 [INFO ] [ClusterHttpManagement] - Bound akka-management HTTP endpoint to: 0.0.0.0:19999
18:48:24 [INFO ] [ClusterManagement$ ] - Cluster HTTP Management started on 127.0.0.1:19999
18:48:24 [INFO ] [InternalManagementHttpService] - LiveSafe (Internal Management) Worker HttpService started, ready to service requests on: /0:0:0:0:0:0:0:0:20000
log4j:WARN No appenders could be found for logger (com.amazonaws.AmazonWebServiceClient).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
18:48:27 [WARN ] [ReliableDeliverySupervisor] - Association with remote system [akka.tcp://worker-cluster@SEED_NODE_IP:2553] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://worker-cluster@SEED_NODE_IP:2553]] Caused by: [No route to host]
18:48:27 [WARN ] [NettyTransport ] - Remote connection to [null] failed with java.net.NoRouteToHostException: No route to host
18:48:27 [INFO ] [RemoteActorRefProvider$RemoteDeadLetterActorRef] - Message [akka.cluster.InternalClusterAction$InitJoin$] from Actor[akka://worker-cluster/system/cluster/core/daemon/joinSeedNodeProcess-1#-1423693763] to Actor[akka://worker-cluster/deadLetters] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
18:48:29 [INFO ] [RemoteActorRefProvider$RemoteDeadLetterActorRef] - Message [akka.cluster.InternalClusterAction$InitJoin$] from Actor[akka://worker-cluster/system/cluster/core/daemon/joinSeedNodeProcess-1#-1423693763] to Actor[akka://worker-cluster/deadLetters] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
18:48:34 [WARN ] [JoinSeedNodeProcess ] - Couldn't join seed nodes after [2] attempts, will try again. seed-nodes=[akka.tcp://worker-cluster@SEED_NODE_IP:2553]
18:48:37 [WARN ] [NettyTransport ] - Remote connection to [null] failed with java.net.NoRouteToHostException: No route to host
18:48:37 [WARN ] [ReliableDeliverySupervisor] - Association with remote system [akka.tcp://worker-cluster@SEED_NODE_IP:2553] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://worker-cluster@SEED_NODE_IP:2553]] Caused by: [No route to host]
18:48:37 [INFO ] [RemoteActorRefProvider$RemoteDeadLetterActorRef] - Message [akka.cluster.InternalClusterAction$InitJoin$] from Actor[akka://worker-cluster/system/cluster/core/daemon/joinSeedNodeProcess-1#-1423693763] to Actor[akka://worker-cluster/deadLetters] was not delivered. [3] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
18:48:39 [WARN ] [JoinSeedNodeProcess ] - Couldn't join seed nodes after [3] attempts, will try again. seed-nodes=[akka.tcp://worker-cluster@SEED_NODE_IP:2553]
18:48:39 [INFO ] [RemoteActorRefProvider$RemoteDeadLetterActorRef] - Message [akka.cluster.InternalClusterAction$InitJoin$] from Actor[akka://worker-cluster/system/cluster/core/daemon/joinSeedNodeProcess-1#-1423693763] to Actor[akka://worker-cluster/deadLetters] was not delivered. [4] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
18:48:44 [WARN ] [JoinSeedNodeProcess ] - Couldn't join seed nodes after [4] attempts, will try again. seed-nodes=[akka.tcp://worker-cluster@SEED_NODE_IP:2553]
18:48:47 [WARN ] [NettyTransport ] - Remote connection to [null] failed with java.net.NoRouteToHostException: No route to host
18:48:47 [WARN ] [ReliableDeliverySupervisor] - Association with remote system [akka.tcp://worker-cluster@SEED_NODE_IP:2553] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://worker-cluster@SEED_NODE_IP:2553]] Caused by: [No route to host]
18:48:47 [INFO ] [RemoteActorRefProvider$RemoteDeadLetterActorRef] - Message [akka.cluster.InternalClusterAction$InitJoin$] from Actor[akka://worker-cluster/system/cluster/core/daemon/joinSeedNodeProcess-1#-1423693763] to Actor[akka://worker-cluster/deadLetters] was not delivered. [5] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
18:48:49 [WARN ] [JoinSeedNodeProcess ] - Couldn't join seed nodes after [5] attempts, will try again. seed-nodes=[akka.tcp://worker-cluster@SEED_NODE_IP:2553]```