Distributed Workers Sample - Confirmation to WorkManager leads to dead-letters for long-running work items

08eb5ca57e52a94e1ab9a43ddd78df · July 2, 2020, 8:54pm

I am experimenting with akka-sample-distributed-workers-scala. In my case, I have several work items that are quickly executed (<1sec) but also a few that take up to 1-3 minutes. My problem is that after each of these long-running work items is finished, Worker sends back a confirmation to the WorkManager ( confirmTo ! ConsumerController.Confirmed), but these messages do not arrive at the WorkManager actor.

I always get:

[2020-07-02 22:33:58,726] [INFO] [akka://ClusterSystem@127.0.0.1:5002] [worker.Worker$$anonfun$idle$1] [ClusterSystem-akka.actor.default-dispatcher-14] [akka://ClusterSystem/user/worker-2] - Work is complete. Result 0 * 0 = 0
[2020-07-02 22:33:58,738] [INFO] [akka://ClusterSystem@127.0.0.1:2551] [akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef] [ClusterSystem-akka.actor.default-dispatcher-32] [akka://ClusterSystem/deadLetters] - Message [java.lang.Long] to Actor[akka://ClusterSystem/deadLetters] was not delivered. [1] dead letters encountered. If this is not an expected behavior then Actor[akka://ClusterSystem/deadLetters] may have terminated unexpectedly. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[2020-07-02 22:33:58,738] [INFO] [akka://ClusterSystem@127.0.0.1:2551] [akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef] [ClusterSystem-akka.actor.default-dispatcher-32] [akka://ClusterSystem/deadLetters] - Message [java.lang.Long] to Actor[akka://ClusterSystem/deadLetters] was not delivered. [2] dead letters encountered. If this is not an expected behavior then Actor[akka://ClusterSystem/deadLetters] may have terminated unexpectedly. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

I don’t understand what is causing this. Is there an internal timeout that destroys the WorkManager actor? This issue can be easily reproduced in the following ways:

WorkManager.scala: Increase timeout, e.g., implicit val timeout = Timeout(10.minutes)
WorkExecutor.scala:

ctx.scheduleOnce(
          75000.millis, // Strangely, 60000.millis seems to be the upper bound
          doWork.replyTo,
          WorkComplete(result)
        )

patriknw · July 3, 2020, 6:21am

That is probably configuration akka.reliable-delivery.work-pulling.producer-controller.internal-ask-timeout, which by default is 1 minute.

08eb5ca57e52a94e1ab9a43ddd78df · July 3, 2020, 7:37am

Thanks a lot! This solved my issue.

Can be closed.

Topic		Replies	Views
Akka Cluster Comunication Problem Akka Libraries akka	1	429	August 20, 2023
Digging into akka distributed workers scala Akka Cluster	0	585	February 7, 2019
Supervisor restart ends in an inactive actor: messages sent to dead letter Akka Libraries akka-cluster	5	1211	October 13, 2020
Akka cluster & DistributedPubSub on kubernetes Message goes to dead letters Akka Cluster scala , akka-cluster	1	510	January 19, 2021
Dead letters after 30 min to Kafka producer plainsink Akka Streams & Alpakka	5	2734	April 26, 2019

Distributed Workers Sample - Confirmation to WorkManager leads to dead-letters for long-running work items

Related topics