Hi,
I’ve been using alpakka kafka to streaming data from kafka topics. I’m using:
Consumer
.committableSource(consumerSettings, Subscriptions.topics(topic))
Recently I’ve tried to spam more consumers like 3 on a topic which has 15 partitions. When I plug more consumers with the same group id, it kindly split 5 partitions per consumer, but it seems to do not consume all partitions at the same time, it seems to read one by one, or read a specific partition much faster than others.
|Partition|LogSize|Consumer Offset|Lag|
|---|---|---|---|
|0|8,429,145|6,087,144|2,342,001|
|1|8,424,948|6,223,257|2,201,691|
|2|8,428,121|7,764,854|663,267|
|3|8,421,528|6,071,425|2,350,103|
|4|8,434,659|7,351,552|1,083,107|
|5|8,428,323|5,935,336|2,492,987|
|6|8,424,974|6,455,301|1,969,673|
|7|8,431,820|7,763,984|667,836|
|8|8,425,999|6,370,962|2,055,037|
|9|8,416,354|6,681,093|1,735,261|
|10|8,416,217|6,814,949|1,601,268|
|11|8,428,026|5,878,703|2,549,323|
|12|8,424,604|8,424,589|15|
|13|8,431,019|8,431,019|0|
|14|8,423,218|8,423,218|0|
Here is a real example of a production application I’m running. So I have some questions:
- Is it ok to read some partitions much faster than others? Please, note that this behavior only happens when I start more than one consumer.
- Should I change the way I’m consuming? Should I use source per partition, or is there a different option?
I appreciate any help, thanks!
Thiago
Update
I was suspecting that it could happen when plugging more than one consumer(read more than one application), but it happened today using only one consumer, you can see by taking a look at the consumer group, which is the same.
At the time it happened, I had 20MM of messages still waiting to be processed(lag). The above picture is a picture taken from the Kafka manager we have at the company.