Partitions consumed faster than others

iamthiago · August 29, 2018, 4:26pm

Hi,

I’ve been using alpakka kafka to streaming data from kafka topics. I’m using:

Consumer
      .committableSource(consumerSettings, Subscriptions.topics(topic))

Recently I’ve tried to spam more consumers like 3 on a topic which has 15 partitions. When I plug more consumers with the same group id, it kindly split 5 partitions per consumer, but it seems to do not consume all partitions at the same time, it seems to read one by one, or read a specific partition much faster than others.

|Partition|LogSize|Consumer Offset|Lag|
|---|---|---|---|
|0|8,429,145|6,087,144|2,342,001|
|1|8,424,948|6,223,257|2,201,691|
|2|8,428,121|7,764,854|663,267|
|3|8,421,528|6,071,425|2,350,103|
|4|8,434,659|7,351,552|1,083,107|
|5|8,428,323|5,935,336|2,492,987|
|6|8,424,974|6,455,301|1,969,673|
|7|8,431,820|7,763,984|667,836|
|8|8,425,999|6,370,962|2,055,037|
|9|8,416,354|6,681,093|1,735,261|
|10|8,416,217|6,814,949|1,601,268|
|11|8,428,026|5,878,703|2,549,323|
|12|8,424,604|8,424,589|15|
|13|8,431,019|8,431,019|0|
|14|8,423,218|8,423,218|0|

Here is a real example of a production application I’m running. So I have some questions:

Is it ok to read some partitions much faster than others? Please, note that this behavior only happens when I start more than one consumer.
Should I change the way I’m consuming? Should I use source per partition, or is there a different option?

I appreciate any help, thanks!

Thiago

Update

I was suspecting that it could happen when plugging more than one consumer(read more than one application), but it happened today using only one consumer, you can see by taking a look at the consumer group, which is the same.

At the time it happened, I had 20MM of messages still waiting to be processed(lag). The above picture is a picture taken from the Kafka manager we have at the company.

iamthiago · September 6, 2018, 5:45pm

A follow up here: https://stackoverflow.com/questions/52209589/alpakka-kafka-partitions-consumed-faster-than-others

schoeneu · September 10, 2018, 3:37pm

I have not seen this pattern in our usage so far, but I’d like to understand it as well.

When you say “plug more consumers”, what exactly do you mean? Are you

creating multiple Kafka sources, then merging them in a single graph?
creating multiple graphs in the same application?
running multiple instances of your application at the same time, one graph per instance?

iamthiago · September 10, 2018, 4:58pm

Hi @schoeneu,

Sorry for the lack of clarity, but I meant running multiple instances of my application.

I have my application deployed on Kubernetes, 1 pod and 1 consumer only. At some point, I have increased the number of runnable containers by 2 or 3 to increase the message throughput.
When more pods get up and running, it’s possible to see more consumers using the same group Id being used and at the same time, some partitions start to being consumed faster than others.

Topic		Replies	Views
Multisource kafka consumer Akka Streams & Alpakka kafka	3	539	March 22, 2021
Akka kafka consumer parallel processing Akka Streams & Alpakka akka , kafka	5	3588	May 14, 2020
Multiple Consumer threads using Alpakka connector Akka Streams & Alpakka alpakka	0	748	May 2, 2019
What parallel value should I set for Akka Kafka consumer when we scale out more Kubernetes pods? Akka Streams & Alpakka akka-cluster , kafka	1	1095	February 3, 2022
Akka Streams - Kafka offset reset after partition reassignment Akka Streams & Alpakka akka , scala , kafka	1	1708	November 14, 2018

Partitions consumed faster than others

Related topics