Uri.Query space character encoding

Hi!
I’m not sure if this is bug, so I’ve decided to post it here first before raising issue on Github.
I often use Uri with Query class to create request URL. Code example:

val uri = Uri("http://localhost:9000")
  .withQuery(Uri.Query(
     "param" -> "value with spaces"
  ))

Http().singleRequest(RequestBuilding.Get(uri))
// rest of the code

The problem is that Akka sends request with spaces encoded as ‘+’, which is later not recognized by server as space, but as just plain ‘+’ sign. So request URL in above example will be: “http://localhost:9000?param=value+with+spaces”.
Shouldn’t spaces be encoded as ‘%20’ in query parameters?

hi @damian,

When you said:

which is later not recognized by server as space, but as just plain ‘+’ sign.

are you referring to an AkkaHTTP server? I’m assuming you do, in which case I this could be a bug. There are several clients (e.g. browsers) that may still encode blank spaces in form keys and values using the + sign in which case the form won’t be read as expected.

I think the implementation should accept both %20 and + as replacements for the blank space in the query string (and I think the fragment too) when reading a String into a URI.

So request URL in above example will be: “http://localhost:9000?param=value+with+spaces”.
Shouldn’t spaces be encoded as ‘%20’ in query parameters?

I’m not familiar with the latest specs but it was also my understanding that replacing blankspaces with %20 was preferred over using +. Again, it may be necessary to support both, but I agree that defaulting to + is surprising.

Cheers,

Hi @ignasi35

are you referring to an AkkaHTTP server?

No, actually I’m referring to other server (proprietary, I don’t know which technology). Akka HTTP handles both ‘+’ and ‘%20’ as black space, so there is no issue here.

It would be nice to have some kind of method on Uri which allows to choose how I want to encode spaces (‘+’ vs ‘%20’).

1 Like

There’s no mechanism to tune how a QueryString is encoded and blankspaces are always encoded as +.

It’d be great if you raised an issue to discuss next steps or even a PR with an improvement.

There are (at least) two layers of specs / encodings in play here: URI and application/x-www-form-urlencoded. The URI spec does not give any format for the query part (but restricts the valid characters). The HTML specification gives more information about queries are built by the browser to form application/x-www-form-urlencoded POST requests or GET requests with query parameters.

The HTML 4 specification still mandates to replace spaces by + characters and that is also what browsers still do:

Control names and values are escaped. Space characters are replaced by +, and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by %HH, a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as “CR LF” pairs (i.e., %0D%0A).

HTML 5 defers to the URL spec for serializing:

Let name be the result of running percent-encode after encoding with encoding, tuple’s name, the application/x-www-form-urlencoded percent-encode set, and true.

where percent-encode after encoding has this definition:

To percent-encode after encoding, given an encoding encoding, string input, a percentEncodeSet, and a boolean spaceAsPlus, run these steps:

  1. Let output be the empty string.
  2. For each codePoint of input:
  3. If spaceAsPlus is true and codePoint is U+0020, then append U+002B (+) to output.
  4. Otherwise, run [percent-encode after encoding](URL Standard> > encode-after-encoding) with encoding, codePoint, and percentEncodeSet, and append the result to > > output.
  5. Return output.

According to “and true” above spaceAsPlus is set to true in this algorithm.

So, AFAICS, Akka HTTP does everything as specified.

2 Likes