Distributed Workers Sample - Confirmation to WorkManager leads to dead-letters for long-running work items

I am experimenting with akka-sample-distributed-workers-scala. In my case, I have several work items that are quickly executed (<1sec) but also a few that take up to 1-3 minutes. My problem is that after each of these long-running work items is finished, Worker sends back a confirmation to the WorkManager ( confirmTo ! ConsumerController.Confirmed), but these messages do not arrive at the WorkManager actor.

I always get:

[2020-07-02 22:33:58,726] [INFO] [akka://ClusterSystem@127.0.0.1:5002] [worker.Worker$$anonfun$idle$1] [ClusterSystem-akka.actor.default-dispatcher-14] [akka://ClusterSystem/user/worker-2] - Work is complete. Result 0 * 0 = 0
[2020-07-02 22:33:58,738] [INFO] [akka://ClusterSystem@127.0.0.1:2551] [akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef] [ClusterSystem-akka.actor.default-dispatcher-32] [akka://ClusterSystem/deadLetters] - Message [java.lang.Long] to Actor[akka://ClusterSystem/deadLetters] was not delivered. [1] dead letters encountered. If this is not an expected behavior then Actor[akka://ClusterSystem/deadLetters] may have terminated unexpectedly. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[2020-07-02 22:33:58,738] [INFO] [akka://ClusterSystem@127.0.0.1:2551] [akka.remote.RemoteActorRefProvider$RemoteDeadLetterActorRef] [ClusterSystem-akka.actor.default-dispatcher-32] [akka://ClusterSystem/deadLetters] - Message [java.lang.Long] to Actor[akka://ClusterSystem/deadLetters] was not delivered. [2] dead letters encountered. If this is not an expected behavior then Actor[akka://ClusterSystem/deadLetters] may have terminated unexpectedly. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

I don’t understand what is causing this. Is there an internal timeout that destroys the WorkManager actor? This issue can be easily reproduced in the following ways:

  • WorkManager.scala: Increase timeout, e.g., implicit val timeout = Timeout(10.minutes)
  • WorkExecutor.scala:
ctx.scheduleOnce(
          75000.millis, // Strangely, 60000.millis seems to be the upper bound
          doWork.replyTo,
          WorkComplete(result)
        )

That is probably configuration akka.reliable-delivery.work-pulling.producer-controller.internal-ask-timeout, which by default is 1 minute.

Thanks a lot! This solved my issue. :grinning:

Can be closed.