Akka Recovery Timedout

jzelayeta · February 12, 2021, 2:36pm

Hey guys! hope you all fine. We’ve a microservice using akka classic with persistence actor through Mongo (using reactivemongo).

We’ve yesterday seen

akka.persistence.RecoveryTimedOut: Recovery timed out, didn't get snapshot within 30000 milliseconds

during a deploy. During the deploy we estimate that each node was trying to recover ~6k actors.

I’ve been thinking on two parameters:

akka.persistence.max-concurrent-recoveries
connection pool through our mongo datastore (this is controlled by nbChannelsPerNode)

Does those values should be equals (or at least similar)? In our case we have them set on 250
concurrent recoveries through 70 connections per node.

PS: We’re aware of this config but 30 seconds is a lot of time for our SLA, se we need to perform quite faster

johanandren · February 15, 2021, 9:38am

I’d make sure to understand where the bottleneck is before starting tweaking config.

For example:
Look into how much time each actor takes to recover - perhaps a different/more frequent snapshot scheme can make recoveries faster.
Probably also good to have a gut feeling of the throughput limitations of the database to know if that is the bottleneck you are hitting - perhaps more resources on the db side is the solution.

Topic		Replies	Views
Akka Sharding - Event Sourcing - Performance Issue Akka Libraries	0	396	April 23, 2020
Recovery of persistent actors Akka Libraries	2	716	June 1, 2021
Performance of recovery of persistent actors (cassandra backend) Persistence / Event Sourcing	1	1017	August 24, 2018
Question about event sourcing? Akka Libraries	3	668	June 5, 2018
Akka Persistence + Cassandra + Recovery + Circuit Breaker Timeout + Multiple Cassandra Contact Points Persistence / Event Sourcing	1	2906	November 12, 2018

Akka Recovery Timedout

Related topics