Continue buffering new messages when underlying persistence plugin throw ex

Charles · January 6, 2025, 7:21am

I have some eventSourcedBehaviour setup as a cluster. And im uisng r2dbc plugin with postgres for both the journal and snapshot.

when I simulate the failure of postgres(by switching it completely off),

i first get this exception:

persistenceId=TradeProcessingEventSourced|3, akkaSource=akka://CandleCalculator/system/sharding/TradeProcessingEventSourced/51/3, sourceActorSystem=CandleCalculator}
akka.persistence.typed.internal.JournalFailureException: Failed to persist event type [com.okcoin.sharded.candle.engine.demo.eventsourced.SecAgg$Event$Trade] with sequence number [11117] for persistenceId [TradeProcessingEventSourced|3]

and then the next exception

Exception during recovery from snapshot. PersistenceId [TradeProcessingEventSourced|3]. Connection validation failed MDC: {persistencePhase=load-snap, akkaAddress=akka://CandleCalculator@127.0.0.1:2554, persistenceId=TradeProcessingEventSourced|3, akkaSource=akka://CandleCalculator/system/sharding/TradeProcessingEventSourced/51/3, sourceActorSystem=CandleCalculator}
akka.persistence.typed.internal.JournalFailureException: Exception during recovery from snapshot. PersistenceId [TradeProcessingEventSourced|3]. Connection validation failed

Apparently once the persistence failed one time, the actor will switch to recovery mode. and recovery will also fail because the db is down. then the actor will get stuck in the recovery mode until recovery is successful.

based on my observation, the messages sent to the cluster during that time are completely lost because every actor in the cluster is in continuous loop of “attempt to recover and fail to recover”.

im under the impression that new messages should still be buffered while recovery is being attempted. Am i doing something wrong that causes the messages to be lost? or is this the expected behaviour?

thanks in advance.

johanandren · January 7, 2025, 7:49am

Persistence will stop the actor on journal failure, sharding will start an entity that is not running when it receives a message, starting a stopped persistent actor when the database is unavailable will lead to replay failing. Once the database is available, recovery will succeed.

In other words, what you see is the expected behaviour.

If you want retries or other ways to get better guarantee of delivery in face of infrastructure failure, that is best implemented on the sending side. Note that if you do this at the edge of the Akka cluster you also have to take into account the scenario of retrying in the face of the cluster request cluster node going away (because of failures or because of a rolling upgrade)

If you want to do it in the Akka cluster there are a few tools in the toolbox that you can look into: Reliable delivery • Akka Documentation or Futures patterns • Akka Documentation

Topic		Replies	Views
How to persist messages with new sequence Number in case of Recovery Failures Persistence / Event Sourcing akka , java	0	478	October 26, 2020
Persistence failure when replaying events and Failed to persist event with jdbc Persistence / Event Sourcing java , jdbc	0	1225	May 28, 2021
JournalFailureException not propagated to the Terminate signal Persistence / Event Sourcing akka-typed	1	1459	January 29, 2021
My Bug while Using Multiple Cluster sharding with Akka Persistence Persistence / Event Sourcing akka-typed , java , akka-cluster	0	799	November 5, 2020
Question about event sourcing? Akka Libraries	3	668	June 5, 2018

Continue buffering new messages when underlying persistence plugin throw ex

Related topics