Akka Cluster Sharding Issue

Our implemetation is working fine on single node however it fails on multiple nodes. How/What do I do potentially debug the streams, graphs or flows.

akka.cluster.sharding.ShardRegion        : Trying to register to coordinator at [Some(ActorSelection[Anchor(akka://DevelopActorSystem/), Path(/system/sharding/BusinessUnitShardCoordinator/singleton/coordinator)])], but no acknowledgement. Total [20] buffered messages.
[DEBUG] [04/16/2018 14:55:42.782] [DevelopActorSystem-cassandra-plugin-default-dispatcher-57] [akka.actor.LocalActorRefProvider(akka://DevelopActorSystem)] resolve of path sequence [/system/sharding/BusinessUnitShard#1286037936] failed
[ERROR] [04/16/2018 14:55:42.783] [DevelopActorSystem-akka.actor.default-dispatcher-21] [akka.tcp://DevelopActorSystem@10.98.49.170:2551/system/sharding/BusinessUnitShardCoordinator/singleton/coordinator] Exception in receiveRecover when replaying event type [akka.cluster.sharding.ShardCoordinator$Internal$ShardHomeDeallocated] with sequence number [1454] for persistenceId [/sharding/BusinessUnitShardCoordinator].
java.lang.IllegalArgumentException: requirement failed: Shard [12] not allocated: State(Map(),Map(),Set(),Set(),true)

Additionally , I get following on persistence

2018-04-16 15:18:00.343 ERROR 8298 --- [lt-dispatcher-3] a.c.sharding.PersistentShardCoordinator  : Exception in receiveRecover when replaying event type [akka.cluster.sharding.ShardCoordinator$Internal$ShardHomeDeallocated] with sequence number [1454] for persistenceId [/sharding/BusinessUnitShardCoordinator].

Which Akka version is this? 2.4.x? Then I’d recommend updating to 2.5.x since 2.4 is end-of-life. See migration guide.

What has happend here is that the event log for the shard coordinator has become ”corrupt”, probably because you have had two active clusters at the same time writing to the same event log. Then you would have to remove that event log and start fresh as described in docs, but if you switch to ddata mode in 2.5 you don’t have to do that.