Hello,
in one of my applications I’m using the akka-http client and a few days ago I did some load testing and profiling to see, which components are doing most of the allocations and are slowing down the application. I noticed that around 40% of the allocations are done by the akka-http components. To be able to isolate the problem, I created a simple script which calls a HTTP mock server via Http().singleRequest 100,000 times.
The profiling showed that for these 100,000 calls around 650MB of data was allocated in the NewHostConnectionPool in the function runOneTransition (package akka.http.impl.engine.client.pool.NewHostConnectionPool). The reason for the allocations are the debug function calls in this function (e.g. see https://github.com/akka/akka-http/blob/master/akka-http-core/src/main/scala/akka/http/impl/engine/client/pool/NewHostConnectionPool.scala#L252). These debug calls trigger many char / String allocations even if you are not using the debug-level logging, because the log level check is only executed in the debug function (but the String allocation gets triggered already before).
To avoid these allocations the log level check could be either inlined or a macro based solution could be used. I check it locally by removing the debug calls and it lead to around 50% shorter and less GC calls and improved in general the allocation profile of the application.
After these allocations were removed I saw that there was a second function call which was allocating around 400MB of objects for these 100,000 requests. The reason is this function call: https://github.com/akka/akka-http/blob/master/akka-http-core/src/main/scala/akka/http/impl/engine/client/PoolInterface.scala#L168. This is calling the UriParser.parseHost() function which is in the end allocating akka.parboiled2.ValueStack objects. Based on this it looks like that parsing the Host is done very frequently, although I was only calling one host in my test script. I’m not totally into the internals of akka-http, but based on what I understood is that it is using host-connection-pools. Wouldn’t it be possible to parse the host only once per host-connection-pool and reuse that to avoid doing this for every request?
I searched in the open issues and also in the forum, but didn’t found anything so my question in general is if these two hotspots are already known? By improving these two allocation hotspots this would reduce the overall allocations by more than 50% and would improve the overall performance of the client implementation. I wanted to share these findings here.
I have tested this with scala 2.12.8, akka-http 10.1.10 and akka-streams 2.5.25.
Best regards and thanks in advance!
P.S. I wanted to share more references to the actual code, but as a new user I’m only allowed to share two URLs.
@flschulz FYI the new user restrictions have now been lifted, so you can feel free to edit your post or add a reply with more information. I’ll leave it to the Akka team or others to comment on the main topic of your post.
great observation. Could you file an issue with that same information?
Have you tried with real requests that are created anew for each request to see how expensive those wasted allocations would be in relation? If I calculate correctly those two issues in combination would mean a memory churn of ~10kB per request. Regular memory usage would be at least 2 times the size of the request+response because there’s one buffer copy after receiving data from the network and another one when creating the high-level models. For really small requests I could see how it could make a difference but how high would the request rate have to be for that allocation rate to make a difference in real world programs? You could also compare how much CPU it costs to reclaim 10kB of memory compared to the CPU cost of running a request. Are these in similar orders of magnitude that optimizing that would make a difference?
The whole client pipeline hasn’t seen to many optimizations. Mostly because HTTP clients often don’t need the same level of scalability that servers need. We’d certainly like to hear about applications that would need those levels of performance.
That said, those issues seem to be easy enough to fix and might safe some CPU time as well so we might have a look at it.
I have tried this also with real requests. Initially I stumbled upon this, because I profiled a real service, which is using the akka-http client to push data to webhooks. This service is consuming the data from different sources and during a performance profiling, I noticed that these two mentioned functions of the akka-http client implementation are responsible for a bigger percentage of the overal allocations. To be able to check the Http().singleRequest function isolated, I created a small script to generate example calls.
I agree and understand that it doesn’t sound like that much overhead, but for applications, which are doing very frequent http calls to e.g. transport data over the wire, call webhooks etc. (which can’t use e.g. websockets etc. because of different reasons), this is accumulating very quickly and what I have seen, it would make very quickly a real difference.
As I already said in my first post, I wanted to raise awareness for this and I’m happy that opening this as an issue is fine for you. If you need more information etc., feel free to ask, I’m happy if I can help.