Unescape Python Strings From HTTP

https://stackoverflow.com/questions/780334

13-09-2019
|

Question

I've got a string from an HTTP header, but it's been escaped.. what function can I use to unescape it?

myemail%40gmail.com -> myemail@gmail.com

Would urllib.unquote() be the way to go?

Solution

I am pretty sure that urllib's unquote is the common way of doing this.

>>> import urllib
>>> urllib.unquote("myemail%40gmail.com")
'myemail@gmail.com'

There's also unquote_plus:

Like unquote(), but also replaces plus signs by spaces, as required for unquoting HTML form values.

OTHER TIPS

Yes, it appears that urllib.unquote() accomplishes that task. (I tested it against your example on codepad.)

In Python 3, these functions are urllib.parse.unquote and urllib.parse.unquote_plus.

The latter is used for example for query strings in the HTTP URLs, where the space characters () are traditionally encoded as plus character (+), and the + is percent-encoded to %2B.

In addition to these there is the unquote_to_bytes that converts the given encoded string to bytes, which can be used when the encoding is not known or the encoded data is binary data. However there is no unquote_plus_to_bytes, if you need it, you can do:

def unquote_plus_to_bytes(s):
    if isinstance(s, bytes):
        s = s.replace(b'+', b' ')
    else:
        s = s.replace('+', ' ')
    return unquote_to_bytes(s)

More information on whether to use unquote or unquote_plus is available at URL encoding the space character: + or %20.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow