Why/When would EventsByTagStage receive multiple pubsub notifications within the defined interval?

We are using akka-persistence-cassandra v0.103 with the following configurations

cassandra-journal.pubsub-notification = 3s
cassandra-query-journal.refresh-interval = 10s
cassandra-query-journal.events-by-tag.eventual-consistency-delay = 7s

We also enabled (verbose) debug logs for akka.persistence.cassandra.query.

When there is no new events, we can observe that:

  • no log entries of [{}] Received pub sub tag update for our tag, initiating query :white_check_mark:
  • number of log entries of "[{}] Executing query: timeBucket: {} from offset: {} to offset: {}" matches the refresh-interval config, i.e. within 5mins interval, roughly 30 * 4 (shards per event) * X (number of projections) :white_check_mark:

When there are heavy writes, log observations does not match our understanding of pubsub-notification.

  • TagWriter triggerring the pubsub notification via PubSubThrottler should limit number of notifications to at most 2 times per interval (1st time gets forwarded right away, and 2nd time onwards (if any) get conflated and sent on Tick)
  • However, we are seeing multiple log entries [{}] Received pub sub tag update for our tag, initiating query within same second for same stageUuid :negative_squared_cross_mark: (see image below)

As of now, we don’t have debug logs for akka.persistence.cassandra.journal so we cannot be certain (will get it enabled soon). But from understanding, it seems there can be 2 possible explanations:

  1. There could be a build up of messages on the subscriber side (in EventsByTagStage), hence we see these logs in succession, but they were published in properly throttled manner. However, this would be unlikely since our eventual consistency delay is less than refresh interval, the code here would just schedule the TagNotification, so it shouldn’t take long?
  2. If (1) is not the case, since there is a check for publishedTag.equals(session.tag) in EventsByTagStage before the log entry, and that akka distributed pubsub has at-most-once delivery guarantee, this suggests that PubSubThrottler doesn’t actually throttle as we expect it to be? I notice that on tick, we will traverse all previous senders of the repeated messages and send them. This means even if the tag is same, but comes from different senders, PubSubThrottler would only delay sending the notification, but does not conflate at all. But this is not really possible given there should only be one TagWriter per tag?

We will enable the journal debug logs to investigate further, but raising this topic here first in case anyone has any ideas / suggestions. Just on top of my head, the PubSubThrottler is fully internal to cassandra journal, so maybe it’s intended that there could be multiple senders by tag (?)

This is explained as soon as I have debug logs enabled for cassandra-journal too. Indeed there are multiple TagWriters per tag, one for each instance of the service pod in the cluster (I can observe corresponding number of log entries of Running TagWriter for [{}] with settings {})