While using akka-stream-alpakka-xml with akka http requests, I noticed interesting behavior when parsing large xml files.
Detailed scenario is as follows:
user posts xml file using akka http endpoint
xml is then taken from request entity as bytestring source and parsed by XmlParsing.parser
XmlParsing logic works with the same xml without akka http, or when I first materialize it by Unmarshal(httpRequest.entity).to[String] and pass it as Source.single(Bytestring) to XmlParsing.parser.
When I run source.via(XmlParsing.parser), where source is from akka http request (request.entity.dataBytes), I almost always see that some keys from xml are splitted:
testuser/one/two/three/four/five/six/seven/eight/nine/ten/eleven/twelve/sub736/KM8DaDXEVNP4MByygsM8d5vK96NStFJC=87i and
zFQw0lL90.txt
I susspect that this is related to fact that parsing is faster then source?
Is it that you get several consecutive TextEvents?
That may happen when the source delivers the data in “chopped up” ByteStrings as it doesn’t know about the structure. The parser does not try to aggregate data internally until a text section ends.
thanks for response. yes indeed I think I get two textEvents…
It also looks like source is not divided randomly, but more like on buffer size or something