Akka kafka consumer parallel processing

vamsi · May 6, 2020, 4:28am

Hi,

I’m working on a Kafka consumer application using Akka Kafka connector.
I would like to consumer to process messages parallelly.
which consumer group should I use https://doc.akka.io/docs/alpakka-kafka/current/consumer.html?
how can I configure the parallelism on the consumer side?

seglo · May 6, 2020, 6:07pm

Hi @vamsi,

You can achieve parallelism a number of different ways with Alpakka Kafka. The standard way is to run multiple applications that use the same Kafka consumer group id and topic subscription.

You may also achieve parallelism within each application by using the mapAsync operator or the “partitioned” Sources. mapAsync will give you a “threadpool”-like ability to process messages you consume. For CPU-bound processing you can should the parallelism argument of mapAsync to the number of cores available to the app, for IO-bound operations you should choose a value that works best for you (i.e. max number of connections you want to make to a networked service).

The partitioned Sources provide a way of processing messages from different partitions separately, instead of funnelling messages of all partitions into one stream. You may use the various sub stream operators here to achieve parallelism.

As for what Source to choose, it’s probably best to start with plainSource or committableSource. Use plainSource if you don’t need to commit offsets back to Kafka.

Hope that helps!

Regards,
Sean

vamsi · May 6, 2020, 6:23pm

Thank you @seglo!

vamsi · May 7, 2020, 2:11pm

@seglo one more question.
My application is IO bound and Kafka topic is only one partition.
is it possible to parallelize the consumer to process the message process concurrently?

Alternatively, I did partitioned topic and create the same number of consumer instances to process parallelly but was wondering if I can do similar when there is only one partition?

seglo · May 7, 2020, 6:41pm

In a 1 partition scenario consumer groups wouldn’t serve much purpose, except as fail over, because a partition can only be assigned to one consumer group member at a time. You may process many messages from a single partition in parallel if you like. Since it’s IO-bound you could try using larger numbers for the parallelism argument of mapAsync.

mapAsync will ensure that transformed results are in the same order as input messages, which can add extra delay (i.e. message A is still processing when message B is already complete, message B will be blocked by message A). When you want to ensure strict ordering then this is good, but if you don’t care then you could use mapAsyncUnordered which will immediately push results downstream regardless of order. Keep in mind that if you commit offsets then this could cause you to skip offsets and break at least once semantics on failure/restarts, so in those cases it’s best to maintain the ordering.

vamsi · May 14, 2020, 12:37am

thank you @seglo , I decided to use consumer per partition.

Topic		Replies	Views
What parallel value should I set for Akka Kafka consumer when we scale out more Kubernetes pods? Akka Streams & Alpakka akka-cluster , kafka	1	1088	February 3, 2022
Alpakka not using more than one CPU core Akka Streams & Alpakka akka-typed , akka-cluster , kafka , alpakka , streams	4	1016	February 17, 2021
Multisource kafka consumer Akka Streams & Alpakka kafka	3	530	March 22, 2021
Multiple Consumer threads using Alpakka connector Akka Streams & Alpakka alpakka	0	745	May 2, 2019
Akka-Kafka Cluster multiple consumer groups Akka Streams & Alpakka	2	631	April 6, 2021

Akka kafka consumer parallel processing

Related topics