Hi guys,
I have a strange behavior in Akka remote artery that suddenly happened.
I have one client and two servers, communicating through Akka. The two servers have a master-slave architecture, in which the client primarily communicates with one of the servers, for retrieving info and periodic pings to ensure master is still running as the master.
What happens is that when one of the servers is master (server1.net.sw.se), there is a significant amount of dropped messages in client’s send queue.
Connection is established because some messages are sent and received, but most sent are dropped. If I switch master to the other (slave) server, communication is normal - no dropped messages. And if I then restart the problematic server process (server1.net.sw.se) and let that server become master again, the problem reoccurrs on client side!
So it seems something is permanently corrupt in the client’s actor system in how it communicates with actor system on server1.net.sw.se .
Only workaround to this was to restart the client process.
It is not any significant application load from the clients - only occassional ping messages and info exchange messages.
Both client- and server actor systems are subscribing to the ThisActorSystemQuarantinedEvent event, and restarts when received.
However, this doesn’t seem to be a quarantine case since some messages are being transmitted.
Akka version is 2.5.19.
Do you have any thoughts on this? Something that might be solved in 2.5.20 or 2.5.21, or something that we’ve missed in our akka usage?
Unfortunately, the issue is not happening right now so I’m not able to retrieve any more debugging from the observation.
Logs from client, when switching master server to the problematic server1.net.sw.se:
2019-02-26T20:22:39.368+0100 | .a.Association(akka://AS-A13601) | com.typesafe.akka.slf4j - 2.5.19 | Quarantine piercing attempt with message [ActorSelectionMessage(RegisterClientMessage)] to [Actor[akka://AS-A16300@server1.net.sw.se:16300/]]
2019-02-26T20:22:39.368+0100 | .a.Association(akka://AS-A13601) | com.typesafe.akka.slf4j - 2.5.19 | Dropping message [ActorSelectionMessage(RegisterClientMessage)] from [Actor[akka://AS-A13601/user/$a/$a#2021643780]] to [Actor[akka://AS-A16300@server1.net.sw.se:16300/]] due to overflow of send queue, size [3072]
2019-02-26T20:22:39.603+0100 | a.i.TcpListener | com.typesafe.akka.slf4j - 2.5.19 | New connection accepted
2019-02-26T20:22:39.603+0100 | a.i.SelectionHandler | com.typesafe.akka.slf4j - 2.5.19 | now supervising Actor[akka://AS-A13601/system/IO-TCP/selectors/$a/3180#884609065]
2019-02-26T20:22:39.603+0100 | a.i.TcpIncomingConnection | com.typesafe.akka.slf4j - 2.5.19 | started (akka.io.TcpIncomingConnection@5f600bd2)
2019-02-26T20:22:39.603+0100 | a.i.TcpIncomingConnection | com.typesafe.akka.slf4j - 2.5.19 | now watched by Actor[akka://AS-A13601/system/IO-TCP/selectors/$a#-893472475]
2019-02-26T20:22:39.604+0100 | a.s.i.StreamSupervisor | com.typesafe.akka.slf4j - 2.5.19 | now supervising Actor[akka://AS-A13601/system/StreamSupervisor-0/remote-4315-1#-1482106711]
2019-02-26T20:22:39.604+0100 | a.s.i.StreamSupervisor | com.typesafe.akka.slf4j - 2.5.19 | now supervising Actor[akka://AS-A13601/system/StreamSupervisor-0/remote-4315-0-map#-1262488769]
2019-02-26T20:22:39.604+0100 | a.i.TcpIncomingConnection | com.typesafe.akka.slf4j - 2.5.19 | now watched by Actor[akka://AS-A13601/system/StreamSupervisor-0/$$SX#65519072]
2019-02-26T20:22:39.604+0100 | a.s.i.f.ActorGraphInterpreter | com.typesafe.akka.slf4j - 2.5.19 | started (akka.stream.impl.fusing.ActorGraphInterpreter@265b3f56)
2019-02-26T20:22:39.604+0100 | a.s.i.i.TLSActor | com.typesafe.akka.slf4j - 2.5.19 | started (akka.stream.impl.io.TLSActor@1af2b499)
2019-02-26T20:22:39.605+0100 | a.i.TcpListener | com.typesafe.akka.slf4j - 2.5.19 | New connection accepted
2019-02-26T20:22:39.605+0100 | a.i.SelectionHandler | com.typesafe.akka.slf4j - 2.5.19 | now supervising Actor[akka://AS-A13601/system/IO-TCP/selectors/$a/3181#-1869580023]
2019-02-26T20:22:39.605+0100 | a.i.TcpIncomingConnection | com.typesafe.akka.slf4j - 2.5.19 | started (akka.io.TcpIncomingConnection@56745ad9)
2019-02-26T20:22:39.605+0100 | a.i.TcpIncomingConnection | com.typesafe.akka.slf4j - 2.5.19 | now watched by Actor[akka://AS-A13601/system/IO-TCP/selectors/$a#-893472475]
2019-02-26T20:22:39.606+0100 | a.s.i.StreamSupervisor | com.typesafe.akka.slf4j - 2.5.19 | now supervising Actor[akka://AS-A13601/system/StreamSupervisor-0/remote-4316-1#-879100037]
2019-02-26T20:22:39.606+0100 | a.s.i.StreamSupervisor | com.typesafe.akka.slf4j - 2.5.19 | now supervising Actor[akka://AS-A13601/system/StreamSupervisor-0/remote-4316-0-map#-426425576]
2019-02-26T20:22:39.606+0100 | a.i.TcpIncomingConnection | com.typesafe.akka.slf4j - 2.5.19 | now watched by Actor[akka://AS-A13601/system/StreamSupervisor-0/$$TX#-584777727]
2019-02-26T20:22:39.606+0100 | a.s.i.f.ActorGraphInterpreter | com.typesafe.akka.slf4j - 2.5.19 | started (akka.stream.impl.fusing.ActorGraphInterpreter@4f80d2dc)
2019-02-26T20:22:39.606+0100 | a.s.i.i.TLSActor | com.typesafe.akka.slf4j - 2.5.19 | started (akka.stream.impl.io.TLSActor@758e4643)
2019-02-26T20:22:39.637+0100 | a.r.a.Decoder | com.typesafe.akka.slf4j - 2.5.19 | Decoded message but unable to record hits for compression as no remoteAddress known. No association yet?
2019-02-26T20:22:39.639+0100 | a.r.a.Decoder | com.typesafe.akka.slf4j - 2.5.19 | Decoded message but unable to record hits for compression as no remoteAddress known. No association yet?
2019-02-26T20:22:39.639+0100 | .a.Association(akka://AS-A13601) | com.typesafe.akka.slf4j - 2.5.19 | Starting outbound control stream to [akka://AS-A16300@server1.net.sw.se:16300]
2019-02-26T20:22:39.639+0100 | .a.Association(akka://AS-A13601) | com.typesafe.akka.slf4j - 2.5.19 | Incarnation 4 of association to [akka://AS-A16300@server1.net.sw.se:16300] with new UID[-4102320532937819035] (old UID [8096406311518835292])
2019-02-26T20:22:39.639+0100 | a.s.i.StreamSupervisor | com.typesafe.akka.slf4j - 2.5.19 | now supervising Actor[akka://AS-A13601/system/StreamSupervisor-0/remote-4317-0-ignoreSink#616861197]
2019-02-26T20:22:39.640+0100 | a.s.i.f.ActorGraphInterpreter | com.typesafe.akka.slf4j - 2.5.19 | started (akka.stream.impl.fusing.ActorGraphInterpreter@2fc7eacd)
2019-02-26T20:22:39.640+0100 | a.s.i.StreamSupervisor | com.typesafe.akka.slf4j - 2.5.19 | now supervising Actor[akka://AS-A13601/system/StreamSupervisor-0/remote-4319-1#-748879484]
2019-02-26T20:22:39.640+0100 | a.i.TcpManager | com.typesafe.akka.slf4j - 2.5.19 | now watched by Actor[akka://AS-A13601/system/StreamSupervisor-0/$$UX#256478128]
2019-02-26T20:22:39.640+0100 | a.i.SelectionHandler | com.typesafe.akka.slf4j - 2.5.19 | now supervising Actor[akka://AS-A13601/system/IO-TCP/selectors/$a/3182#741439304]
2019-02-26T20:22:39.641+0100 | a.s.i.i.TLSActor | com.typesafe.akka.slf4j - 2.5.19 | started (akka.stream.impl.io.TLSActor@5810f4bf)
2019-02-26T20:22:39.641+0100 | a.i.TcpOutgoingConnection | com.typesafe.akka.slf4j - 2.5.19 | started (akka.io.TcpOutgoingConnection@56ee83b4)
2019-02-26T20:22:39.642+0100 | a.i.TcpOutgoingConnection | com.typesafe.akka.slf4j - 2.5.19 | now watched by Actor[akka://AS-A13601/system/IO-TCP/selectors/$a#-893472475]
2019-02-26T20:22:39.643+0100 | a.i.TcpOutgoingConnection | com.typesafe.akka.slf4j - 2.5.19 | Resolving server1.net.sw.se before connecting
2019-02-26T20:22:39.643+0100 | a.i.SimpleDnsManager | com.typesafe.akka.slf4j - 2.5.19 | Resolution request for server1.net.sw.se from Actor[akka://AS-A13601/system/IO-TCP/selectors/$a/3182#741439304]
2019-02-26T20:22:39.643+0100 | a.i.TcpOutgoingConnection | com.typesafe.akka.slf4j - 2.5.19 | Attempting connection to [server1.net.sw.se/10.61.204.42:16300]
2019-02-26T20:22:39.643+0100 | a.i.TcpOutgoingConnection | com.typesafe.akka.slf4j - 2.5.19 | Connection established to [server1.net.sw.se:16300]
2019-02-26T20:22:39.643+0100 | a.i.TcpManager | com.typesafe.akka.slf4j - 2.5.19 | no longer watched by Actor[akka://AS-A13601/system/StreamSupervisor-0/$$UX#256478128]
2019-02-26T20:22:39.643+0100 | a.i.TcpOutgoingConnection | com.typesafe.akka.slf4j - 2.5.19 | now watched by Actor[akka://AS-A13601/system/StreamSupervisor-0/$$UX#256478128]
2019-02-26T20:22:39.668+0100 | r.a.c.InboundActorRefCompression | com.typesafe.akka.slf4j - 2.5.19 | Initializing InboundActorRefCompression for originUid [-4102320532937819035]
2019-02-26T20:22:39.668+0100 | r.a.c.InboundManifestCompression | com.typesafe.akka.slf4j - 2.5.19 | Initializing InboundManifestCompression for originUid [-4102320532937819035]
2019-02-26T20:22:39.669+0100 | .a.Association(akka://AS-A13601) | com.typesafe.akka.slf4j - 2.5.19 | Dropping message [PingMessage] from [Actor[akka://AS-A13601/user/$a/$a#2021643780]] to [Actor[akka://AS-A16300@server1.net.sw.se:16300/user/proc/$c#1156750475]] due to overflow of send queue, size [3072]
Best Regards,