Alpakka consumer can be created for consuming from several kafka topics. I wonder if it is a good design choice to consume from several kafka topics in scope of one stream? I have several concerns about it. How does pulling of messages work in that case(what if 1 topic has a higher rate of imcoming messages than others)? Is it beneficial to use this approach with 1 stream consuming from several topics if the sink of that stream ends up writing to the same table in some SQL storage, for example(meaning that the table holds entity data which are constructed from that several topics)
Hi Maksym. Are you referring to a Kafka subscription referencing more than 1 topic or joining two Alpakka Kafka sources with Akka Streams that have separate subscriptions to different topics? In the latter case, Akka Streams will back-pressure the faster source to a throughput your downstream stages can accommodate. The former is a matter of Kafka internals. Every poll the Kafka consumer will fetch from all subscribed partitions (if there are no in-flight requests already to that partition), but the Kafka Consumer and Broker (and possibly Topic) configuration that determine how much data is returned to the consumer for any given request. If your consumer cannot keep up with the “faster” topic, then you’ll see the consumer group lag for that topic increase over time.
Hi Sean,
I’m referring to the case when we tell consumerSunscription.topics(topic1, topic2)
. Let’s imagine that rate of incoming messages for topic2 is 2 times faster than for topic1. Will the consumer poll only topic2 until no new messages are there and then fallback to topic1 or the process will happen in round-robin fashion: poll N messages from topic 2(if they are available) from all its partitions, then poll from topic 1(from all its partions) N messages and repeat?
The Alpakka Kafka Subscription.topics
API maps directly to a Kafka Consumer subscribe API, so the behaviour is a function of the Kafka Consumer and not Alpakka Kafka.
The Kafka Consumer will resolve all the partitions that are part of the subscription (across all topics) and fetch data from each in a round-robin like fashion. Even if topic A’s partitions are being produced to faster than topic B’s, the consumer will still request partitions from B’s.