XML parsing from entity bytestring

arempter · October 23, 2019, 11:01am

Hello,

While using akka-stream-alpakka-xml with akka http requests, I noticed interesting behavior when parsing large xml files.

Detailed scenario is as follows:

user posts xml file using akka http endpoint
xml is then taken from request entity as bytestring source and parsed by XmlParsing.parser

XmlParsing logic works with the same xml without akka http, or when I first materialize it by Unmarshal(httpRequest.entity).to[String] and pass it as Source.single(Bytestring) to XmlParsing.parser.

When I run source.via(XmlParsing.parser), where source is from akka http request (request.entity.dataBytes), I almost always see that some keys from xml are splitted:

testuser/one/two/three/four/five/six/seven/eight/nine/ten/eleven/twelve/sub736/KM8DaDXEVNP4MByygsM8d5vK96NStFJC=87i and
zFQw0lL90.txt

I susspect that this is related to fact that parsing is faster then source?

What do you think?

arempter · October 23, 2019, 11:13am

if this helps, I have a working code in

if you run curl -XPOST http://localhost:8123 -d @large.xml two or more times you should see line split

ennru · October 25, 2019, 1:40pm

Hi @arempter

Is it that you get several consecutive TextEvents?
That may happen when the source delivers the data in “chopped up” ByteStrings as it doesn’t know about the structure. The parser does not try to aggregate data internally until a text section ends.

Cheers,
Enno.

arempter · October 29, 2019, 8:22am

Hi @ennru,

thanks for response. yes indeed I think I get two textEvents…
It also looks like source is not divided randomly, but more like on buffer size or something

ennru · October 31, 2019, 9:45am

If you need to get those into a single event, you could collapse consecutive TextEvents by adding a statefulMapConcat.

Enno.

arempter · November 5, 2019, 6:48am

cool, thanks for help

Topic		Replies	Views
Stream byte array into chunks Akka Streams & Alpakka	4	1831	October 25, 2021
Transform Akka http entity byte string to Long Akka HTTP akka-http	1	2055	January 18, 2019
Get String from RequestEntity using java Akka HTTP akka , akka-http , java	1	2038	July 6, 2018
Help with Akka Actors reading and parsing XML files. Design Akka Libraries	2	597	September 11, 2020
From Directory.ls to ByteString Akka Streams & Alpakka	3	1157	March 15, 2018

XML parsing from entity bytestring

Related topics