Despite the never failing workers, I can still reproduce the above situation. Given:
- One node
- One producer, backed by a durable queue
- Two workers
- One (long-running) job
When I:
- Start the node
- Submit the job
- Stop the node when the submitted job is still in progress.
I get the following warning:
[WARN] [2020-12-02 14:21:15,217] | [saas-akka.actor.default-dispatcher-5] | a.a.CoordinatedShutdown: Coordinated shutdown phase [actor-system-terminate] timed out after 10000 milliseconds
So I think it’s reasonable to assume that the worker couldn’t send the ConsumerController.Confirmed message. Then, when I start again the node, the WorkPullingProducerController restarts, reloading the unconfirmed message. This is what I expected:
- Both workers starts, registering themselves to the
WorkPullingProducerController. - The
WorkPullingProducerControllerreceives two requests from them. Since there is an unconfirmed message to re-send, it sends the unconfirmed message to one of the workers, and sends aRequestNextto the producer for the other one. From what I understand, this shouldn’t change even if I had 100 workers: only oneRequestNextat a time is actually sent to the producer, waiting for a reply to it from the producer before sending the next one.
However, after activating the LogMessagesInterceptor, I see the following messages exchange as soon as I start the node again:
[DEBUG] [2020-12-02 14:21:00,866] | [saas-akka.actor.default-dispatcher-21] | a.a.t.i.LogMessagesInterceptor: actor [akka://saas/system/singletonManagerjobs-manager/jobs-manager] received message: WrappedRequestNext(RequestNext(Actor[akka://saas/system/singletonManagerjobs-manager/jobs-manager/producer-controller/$$a-adapter#1182625933],Actor[akka://saas/system/singletonManagerjobs-manager/jobs-manager/producer-controller#580539556]))
[DEBUG] [2020-12-02 14:21:00,955] | [saas-akka.actor.default-dispatcher-19] | a.a.t.i.LogMessagesInterceptor: actor [akka://saas/user/rendition-worker-2] received message: WrappedDelivery(Delivery(GenerateRendition(ConversionJob("test"))),Actor[akka://saas/user/rendition-worker-2/consumer-controller#216793132],jobs-manager-dd7a608d-c606-4e6f-8244-d2efb0d7ca98,1))
[DEBUG] [2020-12-02 14:21:01,241] | [saas-akka.actor.default-dispatcher-21] | a.a.t.i.LogMessagesInterceptor: actor [akka://saas/system/singletonManagerjobs-manager/jobs-manager] received message: WrappedRequestNext(RequestNext(Actor[akka://saas/system/singletonManagerjobs-manager/jobs-manager/producer-controller/$$a-adapter#1182625933],Actor[akka://saas/system/singletonManagerjobs-manager/jobs-manager/producer-controller#580539556]))
Result: the producer receives first RequestNext, then another one before the first has been used to submit more jobs. So it fails, restarts and gets back both RequestNext messages one after the other:
[DEBUG] [2020-12-02 14:21:01,272] | [saas-akka.actor.default-dispatcher-21] | a.a.t.i.LogMessagesInterceptor: actor [akka://saas/system/singletonManagerjobs-manager/jobs-manager] received message: WrappedRequestNext(RequestNext(Actor[akka://saas/system/singletonManagerjobs-manager/jobs-manager/producer-controller/$$a-adapter#1783057446],Actor[akka://saas/system/singletonManagerjobs-manager/jobs-manager/producer-controller#1183803959]))
[DEBUG] [2020-12-02 14:21:01,302] | [saas-akka.actor.default-dispatcher-21] | a.a.t.i.LogMessagesInterceptor: actor [akka://saas/system/singletonManagerjobs-manager/jobs-manager] received message: WrappedRequestNext(RequestNext(Actor[akka://saas/system/singletonManagerjobs-manager/jobs-manager/producer-controller/$$a-adapter#1783057446],Actor[akka://saas/system/singletonManagerjobs-manager/jobs-manager/producer-controller#1183803959]))
Both the code in the documentation and the more extensive sample code assume that this should never happen. So my questions are:
- Why this is happening?
- How should I manage those cases?