Hello, we use CassandraSession#selectAll
to query our Cassandra cluster for some additional data separate from akka-persistence. Because of some application requirements we block on the result before continuing. We had a case in production where the CompletionStage
(we’re using the Java DSL) never completed and caused the entire thread to block indefinitely due to lack of a timeout.
"application-cassandra-plugin-default-dispatcher-7889" #29169 prio=5 os_prio=0 tid=0x00007fae38042000 nid=0x7abe waiting on condition [0x00007fadd8e0f000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000732b05e58> (a java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at com.twilio.application.ItemDAOCassandraImpl.get(ItemDAOCassandraImpl.java:66)
... snip ...
Where the code is just:
final Select select = QueryBuilder.select()
.from(KEYSPACE, TABLE)
.where(QueryBuilder.eq(ACCOUNT_ID, accountId.getValue()))
.limit(1);
return cassandraSession
.selectAll(select)
.thenApply(rows -> convertToProtocol(rows))
.toCompletableFuture()
.get();
The .thenApply()
is simply object conversion using GettableByNameData#getString
.
There’s no evidence that there was any problem with our Cassandra cluster (network, timeouts, errors, etc.), and I also see nothing obvious in akka-persistence-cassandra that could cause this looking at the SelectSource though I’m not the most well-versed in akka streams.
While we already have plenty of fixes to prevent this from happening again (as well as moving to CassandraSession#selectOne
as that fits our use-case better), I’m hoping to understand why selectAll
may have not returned so that we can reproduce and ensure all of our betterments account for this failure mode.
Using:
- akka 2.5.13
- akka-persistence-cassandra 0.85
- Cassandra 3.0.9
I can give more details if needed, not sure what would be relevant.