Hi David,
maybe the discussion in Throughput and batching with Alpakka Kafka is helpful to you?
Short summary:
-
The
mapAsync
part that you found is independent of the order of publishing. In itself, it only guarantees that you will receive theResults
(i.e. the success notifications of the publishing process) in the same order as the messages you put into it. (If theProducer
was usingmapAsyncUnordered
instead, this would not be guaranteed. Theparallelism
setting affects the size of the internal buffer thanmapAsync
uses, but that buffer uses ahead-of-line-blocking to guarantee ordering.) -
The order of publishing is managed by the underlying
KafkaProducer
from the kafka clients library. The alpakka flexiflow will pass messages to theKafkaProducer
in the same order they arrive, then collect the futures that track publishing success for each message and hand them back to you once they’re done. TheKafkaProducer
will usually keep the order, but make sure to read the documentation for the producer configsretries
andmax.in.flight.requests.per.connection
at https://kafka.apache.org/documentation/ ! In particular, the default formax.in.flight.requests.per.connection
is5
and I don’t think alpakka overrides this by default, so you need to adjust your configuration to retain ordering. -
Finally - although I’m certain you’re aware of that - “publishing in order” only preserves order for downstream consumers of your partitioner is the same for the source and target topic, as two messages consumed from the source will inherently lose their order if they are being published to two different partitions downstream.
Hope that helps!