Problem
In a 3-node Actor cluster, during a rolling restart of nodes, some of the newly created entity entries overwrite the old values for certain keys in Remember entities ORSet.
This issue when pruning occures after the restart. The shards in the cluster are completely rebalanced, and all entity actors are successfully re-created.
However, when retrieving the values using the Replicator Get operation, a difference is observed between the entries before and after the restart for some keys.
Cluster Configurations
akka.cluster.sharding.remember-entities = on
akka.cluster.sharding.remember-entities-store = ddata
akka.cluster.sharding.distributed-data {
durable.keys = []
gossip-interval = 500 ms
notify-subscribers-interval = 100 ms
}
akka.remote.artery{
enabled = on
transport = tcp
}
When an entity is created, it is stored in an ORSet within the Replicator actor and replicated to other nodes. Each shard has 5 keys, and the entity ID is hashed and stored within these 5 keys.
We can retrieve the current remember entities stored in the Replicator using the Replicator Get operation.
Everything seems to be working fine until the nodes are restarted.
The issue here is
- We perform a sequential rolling restart of all 3 nodes.
- After restarting and joining the cluster, all the shards are rebalanced, and the entities are recreated correctly.
- New entities are continuously created from another proxy node to the cluster.
- After the rolling restart after some time around 5 mins (
max-pruning-dissemination = 300 s
) the Rememeber Entities Count is incorrect for some keys - Upon further investigation, it is found that while pruning, some of the newly created entities overwrite their respective keys in ORSet.
Example :-
Node 1 restarted 18:58
Node 2 restarted 18:59
Node 3 restarted 19:00
- At 19:06:xx - The live actor count for Shard_2 is 25.
- At 19:06:xx - The remember entities count for Shard_2 is 25 (across 5 keys).
- At 19:07:35 - Actor_23 is created, which belongs to Shard_2.
- At 19:08:xx - The live actor count for Shard_2 is 26.
- At 19:08:xx - The remember entities count for Shard_2 is 22 (across 5 keys).
In Here when Actor_23 created, it overwrites the key Shard_2-1 from Shard_2-1[Actor_3,Actor_8,Actor_14,Actor_19] to Shard_2-1[Actor_23]
Suspects
- The pruning of ddata is happening on Node 1 only. Pruning of shards data is performed at 19:05:46.
- Before pruning the Shard_2 the Actor_23 is created. And after pruning done 19:05:46 only the new Actor values is present for the Key. From Shard_2-1[Actor_3,Actor_8,Actor_14,Actor_19] to Shard_2-1[Actor_23]
- However, this issue occurs only for some of the keys.
What could be causing this issue?
Version of Actor Cluster 2.5.32.
If any bugs or issues in this old version kindly share the bug link to understand before upgrading