Hi, I have a strange issue with my persistence/eventsourcing/cluster/sharding system.
I have millions of persistent actors that are short-living (2 to 7 days, mostly only 2) that model state. The idea is to have them passivated after some time and thus have only active in memory. There are no snapshots involved.
I noticed that after some time restarting k8s pods lead to errors like the following in logs:
java.lang.IllegalStateException: Sequence number [1] still missing after [10.00 s], saw unexpected seqNr [3] for persistenceId [Entity|02040044000060].
Checking the journal (the cassantra messages
table) shows that there are indeed only events with sequenceNr 3, 4 and 5 for given peristence Id.
The sharding is configured as following:
sharding {
role = "state"
state-store-mode = ddata
passivate-idle-entity-after = 48 h
}
and the actor is implemented very close to the example given in the documentation:
Behaviors.setup { context =>
EventSourcedBehavior.withEnforcedReplies[EntityCommand, EntityEvent, EntityState](
persistenceId = PersistenceId(EntityKey.name, EntityId),
emptyState = EntityState.empty,
commandHandler = commandHandler(context),
eventHandler = eventHandler(context)
)
.onPersistFailure(SupervisorStrategy.restartWithBackoff(minBackoff = 5.seconds, maxBackoff = 30.seconds, randomFactor = 0.1))
.withTagger(tagger)
.withRecovery(Recovery.withSnapshotSelectionCriteria(SnapshotSelectionCriteria.none))
}
I’d be very grateful for any suggestions or ideas about fixing this issue.