Cassandra write-side Kafka producer

Xavier · March 6, 2020, 3:18pm

I am implementing a POC with a Kafka producer which grab around 10 sensor values from the external world per 5 seconds. The values are forwarded to a consumer which has a lot of stuff to do. I am happy that kafka provides retention in case of consumer failure.

Using Lagom, I understand that persistence write-side is mandatory so I am using Cassandra and it works fine. But unfortunately the Cassandra “journal” seems to drastically grow and I can’t imagine the size of my Cassandra database after many years in production. Is there a way to by-pass Cassandra persistence or optimize the data storage on the write-side ?

VictorBLS · March 6, 2020, 4:20pm

It seems that you are not using event sourcing am i right ? If yes, do you really need to store you data for a long time in cassandra ?

Xavier · March 6, 2020, 5:00pm

I suppose I am using event sourcing, since I can see in Cassandra the incoming values one after the others. In the same time, the data are transmitted from the producer to the consumer via Kakfa, all seems to be fine.

Declaration of service:

return Service.named(getServiceName())
                .withTopics(
                        Service.topic(getTopicName(), this::dataValueTopic)
                                .withProperty(KafkaProperties.partitionKeyStrategy(),  AbstractDataValue::getTimeSeriesCode)
                )
                .withAutoAcl(true)

Implementation of the service:

    public void publishDataValue(AbstractDataValue dataValue){
        PersistentEntityRef<DataValueCommand> ref =
                persistentEntityRegistry.refFor(DataValueEntity.class, dataValue.getTimeSeriesCode());
        ref.ask(new DataValueCommand.PublishDataValueMessage(dataValue));
    }

    public Topic<AbstractDataValue> dataValueTopic() {
        return TopicProducer
                .singleStreamWithOffset(
                        offset -> persistentEntityRegistry.eventStream(DataValueEventTag.INSTANCE, offset)
                                .map(this::convertEvent));
    }

    private Pair<AbstractDataValue, Offset> convertEvent(Pair<DataValueEvent, Offset> pair) {
        return new Pair<>(pair.first().getDataValue(), pair.second());
    }

publishDataValue method is used by an internal Akka actor which aims to grab the external sensors data

Persistent entity:

public class DataValueEntity extends PersistentEntity<DataValueCommand, DataValueEvent, DataValueState> {

    @Override
    public Behavior initialBehavior(Optional<DataValueState> snapshotState) {
        BehaviorBuilder b = newBehaviorBuilder(
                snapshotState.orElse(DataValueState.EMPTY));

        b.setCommandHandler(DataValueCommand.PublishDataValueMessage.class,
                (cmd, ctx) ->
                ctx.thenPersist(new DataValueEvent.DataValueMessagePublished(cmd.getDataValue()),
                        // Then once the event is successfully persisted, we respond with done.
                        evt -> ctx.reply(Done.getInstance())));

        b.setEventHandler(DataValueEvent.DataValueMessagePublished.class,
                evt -> new DataValueState());

        return b.build();
    }
}

What I understood is that just after thenPersist, the Lagom framework send the event to Kafka.

This is not the right explanation and the right way ?

VictorBLS · March 6, 2020, 5:39pm

Event source is for distributed entity in large systems. If you just want to consumer faster as you can your data to “do a lot of things” does it seems until that you do not need an event source approach is more like an stream process pipeline than an event source one. You can use Kafka and cassandra to do the both ones but the main difference is the data storage time. Usually in event sourcing you need to storage the data changes for a given entity and persist the incoming events that change the entity. In the other hand you can just process your data and after do something. Also you can apply an event source after the “process” pipeline.

Obviously i don’t know the business logic of your system but this come to my mind.

VictorBLS · March 6, 2020, 5:46pm

Lagom uses cassadnra for two main reasons:

Offset storage - To handle kafka offset
Entity state - To handle event source

Maybe you do not need event sourcing to do what you want =) so the storage problem will not exist

Topic		Replies	Views
Why Lagom uses Cassandra as a default persistent layer not Kafka? Lagom	1	860	January 3, 2020
Isn't maintaining events both in Cassandra and Kafka redundant? Lagom	5	1574	September 6, 2019
Possibility to store the events and offsets of the persistent entities on the message broker (Kafka) Lagom Persistence API	4	1505	October 28, 2018
Consistent delay of 5s observed in publishing messages Lagom	2	647	March 25, 2021
Lagom persistence read-side with Cassandra support takes ~20 min to handle an event Lagom Persistence API scala	3	661	January 20, 2023

Cassandra write-side Kafka producer

Related topics