Unnecessary "errors" from Akka Remoting when a remote ActorSystem terminates

alanbur · June 1, 2018, 3:31pm

We are using Akka Remoting to spin up relatively short-lived remote ActorSystems on other Kubernetes nodes that process work and then exit. I know about Akka Cluster but our needs are simple and we don’t want a long-lived configuration.

Everything works fine and the slaves gracefully disconnect from the controller at the application level and then exit. However the controller spends time waiting for other slaves to exit and while it is doing that it gets a load of “error” messages about the slave that’s shut down cleanly:

13:47:56.578 ERROR [a.s.s.RestartWithBackoffFlow] Restarting graph due to failure
akka.stream.StreamTcpException: Tcp command [Connect(test-1.test-akka.alan.svc.cluster.local:2552,None,List(),Some(5000 milliseconds),true)] failed because of test-1test-akka.alan.svc.cluster.local

There seems to be no way of stopping these messages (log-remote-lifecycle-events = off has no effect) and no way of telling a local ActorSystem that a remote ActorSystem has in fact gracefully terminated.

The other issue is how long the local ActorSystem holds the details of the exited remote - there may be potentially hundreds of them, and if references to remote ActorSystems are held indefinitely they are in effect a memory leak.

patriknw · June 1, 2018, 9:05pm

Is this with Artery TCP transport? Which version?

alanbur · June 1, 2018, 9:18pm

Sorry, I should have provided that info - yes it is with Artery TCP, Akka version 2.5.12

    artery.enabled = true
    artery.transport = tcp

patriknw · June 1, 2018, 9:29pm

Can you give 2.5.13 a try. It’s published but not announced yet. I think we fixed a few things that can be related to this.

alanbur · June 1, 2018, 9:37pm

Sure, will do - thanks! Will probably be Monday before I report back.

alanbur · June 4, 2018, 5:14pm

Yes, it does appear to be a bit less shouty. This is on the server end with a DEBUG loglevel, when the slave exits:

16:59:24.430 INFO  Association to [akka://cfecp@cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552] having UID [666184963331773213] has been stopped. All messages to this UID will be delivered to dead letters. Reason: ActorSystem terminated
16:59:24.511 DEBUG Resolving cfe-test-1.cfe-test-akka.alan.svc.cluster.local before connecting
16:59:24.511 DEBUG Resolution request for cfe-test-1.cfe-test-akka.alan.svc.cluster.local from Actor[akka://cfecp/system/IO-TCP/selectors/$a/10#-629842451]
16:59:24.557 DEBUG Could not establish connection to [cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552] due to java.net.UnknownHostException: cfe-test-1.cfe-test-akka.alan.svc.cluster.local
16:59:24.561 WARN  [outbound connection to [akka://cfecp@cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552], control stream] Upstream failed, cause: StreamTcpException: Tcp command [Connect(cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552,None,List(),Some(5000 milliseconds),true)] failed because of cfe-test-1.cfe-test-akka.alan.svc.cluster.local
16:59:24.562 WARN  Restarting graph due to failure. stack_trace:  (akka.stream.StreamTcpException: Tcp command [Connect(cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552,None,List(),Some(5000 milliseconds),true)] failed because of cfe-test-1.cfe-test-akka.alan.svc.cluster.local)
16:59:24.562 DEBUG Restarting graph in 2010221697 nanoseconds
16:59:26.599 DEBUG Clear system message delivery of [akka://cfecp@cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552#666184963331773213]
16:59:27.456 WARN  [outbound connection to [akka://cfecp@cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552], control stream] Upstream failed, cause: Association$OutboundStreamStopQuarantinedSignal$: 
16:59:27.456 WARN  [outbound connection to [akka://cfecp@cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552], message stream] Upstream failed, cause: Association$OutboundStreamStopQuarantinedSignal$: 
16:59:27.456 DEBUG Outbound control stream to [akka://cfecp@cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552] was quarantined and stopped. It will be restarted if used again.
16:59:27.456 DEBUG Outbound message stream to [akka://cfecp@cfe-test-1.cfe-test-akka.alan.svc.cluster.local:2552] was quarantined and stopped. It will be restarted if used again.

As we normally have Akka loglevel set to ERROR that’s a lot better.

However there still doesn’t seem to be any way of telling Akka that a remote ActorSystem is shutting down and that a disconnect is expected.

patriknw · June 6, 2018, 4:26pm

Thanks for trying that. I agree that it would be nice to make the normal client shutdown/disconnect silent (nothing > DEBUG). Please create an issue referring to this thread.

Topic		Replies	Views
Outbound control stream failed after shutdown Akka Libraries	1	897	July 8, 2020
What does "StreamTcpException: The connection actor has terminated. Stopping now." mean? Akka HTTP akka-http , scala , akka-cluster , alpakka , streams	2	75	August 2, 2024
RemotingLifecycleEvent with artery Akka Libraries	8	756	December 2, 2019
How TCP connections are managed in case of akka remote communication Akka Cluster akka , akka-typed , akka-cluster	1	928	December 12, 2019
My akka cluster node reboot in an unexpected way Akka Cluster akka-management	1	1751	July 2, 2021

Unnecessary "errors" from Akka Remoting when a remote ActorSystem terminates

Related topics