Partitions consumed faster than others

Hi,

I’ve been using alpakka kafka to streaming data from kafka topics. I’m using:

Consumer
      .committableSource(consumerSettings, Subscriptions.topics(topic))

Recently I’ve tried to spam more consumers like 3 on a topic which has 15 partitions. When I plug more consumers with the same group id, it kindly split 5 partitions per consumer, but it seems to do not consume all partitions at the same time, it seems to read one by one, or read a specific partition much faster than others.

|Partition|LogSize|Consumer Offset|Lag|
|---|---|---|---|
|0|8,429,145|6,087,144|2,342,001|
|1|8,424,948|6,223,257|2,201,691|
|2|8,428,121|7,764,854|663,267|
|3|8,421,528|6,071,425|2,350,103|
|4|8,434,659|7,351,552|1,083,107|
|5|8,428,323|5,935,336|2,492,987|
|6|8,424,974|6,455,301|1,969,673|
|7|8,431,820|7,763,984|667,836|
|8|8,425,999|6,370,962|2,055,037|
|9|8,416,354|6,681,093|1,735,261|
|10|8,416,217|6,814,949|1,601,268|
|11|8,428,026|5,878,703|2,549,323|
|12|8,424,604|8,424,589|15|
|13|8,431,019|8,431,019|0|
|14|8,423,218|8,423,218|0|

Here is a real example of a production application I’m running. So I have some questions:

  • Is it ok to read some partitions much faster than others? Please, note that this behavior only happens when I start more than one consumer.
  • Should I change the way I’m consuming? Should I use source per partition, or is there a different option?

I appreciate any help, thanks!

Thiago

Update

I was suspecting that it could happen when plugging more than one consumer(read more than one application), but it happened today using only one consumer, you can see by taking a look at the consumer group, which is the same.

At the time it happened, I had 20MM of messages still waiting to be processed(lag). The above picture is a picture taken from the Kafka manager we have at the company.

A follow up here: https://stackoverflow.com/questions/52209589/alpakka-kafka-partitions-consumed-faster-than-others

I have not seen this pattern in our usage so far, but I’d like to understand it as well.

When you say “plug more consumers”, what exactly do you mean? Are you

  • creating multiple Kafka sources, then merging them in a single graph?
  • creating multiple graphs in the same application?
  • running multiple instances of your application at the same time, one graph per instance?

Hi @schoeneu,

Sorry for the lack of clarity, but I meant running multiple instances of my application.

I have my application deployed on Kubernetes, 1 pod and 1 consumer only. At some point, I have increased the number of runnable containers by 2 or 3 to increase the message throughput.
When more pods get up and running, it’s possible to see more consumers using the same group Id being used and at the same time, some partitions start to being consumed faster than others.