Aggregating a million records using Akka Streams

near34212 · August 7, 2019, 8:46am

Hi Team,

I am running a simulation which generates a million records every second. I’m writing them to Kafka and reading them through Akka Streams. I’m performing a few aggregations on this data and writing the output back to Kafka.

The data contains a timestamp based on which the aggregations are grouped. Using the timestamps, I’m creating windows of data and performing aggregation on these windows. Since there are a million records each second, the aggregations are taking about 40 seconds for one million records. This is really slow because new data is being generated and written to Kafka every second.

I referred this blog post for performing window aggregations.

Is there any better way to perform these aggregations in lesser time(preferably less than one second) using Akka Streams?

Thanks

tg44 · August 7, 2019, 9:15am

Im not a pro in this area but I would go with a different approach. Now you have a big pipe of events, and your solutions seems to be unscaleable. I would write a small app with streams which consumes and maybe meta the data from the original stream, and push back the data to windowed subtopics (like 12:05_5min) and would write an another app which would consume and aggregate the data from those subtopics. The first app is scaleable bcs it dont do any aggregation, and the second app is easily scalable with a “stream reservation” mechanism which can be implemented with cluster actors or redis. If you can scale easily you can finetune your cpu power and can measure the bottlenecks nore easily.

Topic		Replies	Views
Akka Streams and the number of actors created Akka Libraries	6	1272	December 6, 2019
Get a fast reduce function of a big stream Akka Streams & Alpakka	4	669	February 10, 2020
Kafka in place of Akka Streams Akka Streams & Alpakka streams	3	751	February 27, 2020
Should we use Akka Stream and actors vs Spark Streaming? Akka Streams & Alpakka	2	1873	May 17, 2019
Data Segregation using Akka Streams Akka Libraries	5	714	September 5, 2019

Aggregating a million records using Akka Streams

Related topics