I have some eventSourcedBehaviour setup as a cluster. And im uisng r2dbc plugin with postgres for both the journal and snapshot.
when I simulate the failure of postgres(by switching it completely off),
i first get this exception:
persistenceId=TradeProcessingEventSourced|3, akkaSource=akka://CandleCalculator/system/sharding/TradeProcessingEventSourced/51/3, sourceActorSystem=CandleCalculator}
akka.persistence.typed.internal.JournalFailureException: Failed to persist event type [com.okcoin.sharded.candle.engine.demo.eventsourced.SecAgg$Event$Trade] with sequence number [11117] for persistenceId [TradeProcessingEventSourced|3]
and then the next exception
Exception during recovery from snapshot. PersistenceId [TradeProcessingEventSourced|3]. Connection validation failed MDC: {persistencePhase=load-snap, akkaAddress=akka://CandleCalculator@127.0.0.1:2554, persistenceId=TradeProcessingEventSourced|3, akkaSource=akka://CandleCalculator/system/sharding/TradeProcessingEventSourced/51/3, sourceActorSystem=CandleCalculator}
akka.persistence.typed.internal.JournalFailureException: Exception during recovery from snapshot. PersistenceId [TradeProcessingEventSourced|3]. Connection validation failed
Apparently once the persistence failed one time, the actor will switch to recovery mode. and recovery will also fail because the db is down. then the actor will get stuck in the recovery mode until recovery is successful.
based on my observation, the messages sent to the cluster during that time are completely lost because every actor in the cluster is in continuous loop of “attempt to recover and fail to recover”.
im under the impression that new messages should still be buffered while recovery is being attempted. Am i doing something wrong that causes the messages to be lost? or is this the expected behaviour?
thanks in advance.