C# decoding "â„¢" to "TM"

https://stackoverflow.com/questions/18312519

25-06-2022
|

Domanda

on a web page there is following string

"Qualcomm Snapdragon™ S4"

when i get this string in my .net code the string convert to "Qualcomm Snapdragonâ„¢ S4"

the character "TM" change to â„¢

how can i decode "â„¢" back to "TM"

Update

follwoing is the code for downloaded string using webproxy
wc is webproxy

wc.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8");
string html = Server.HtmlEncode(wc.DownloadString(url));

Soluzione

You should read the webpage in its proper encoding in the first place. In this case it seems you are reading with Encoding.Default (i.e. probably CP1252) and the page is really in UTF-8. This should be apparent either by reading the Content-Type header of the response or by looking for a <meta http-equiv="Content-Type" content='text/html; charset=utf-8'> in the content.

If you still need to do this after the fact, then use

var bytes = Encoding.Default.GetBytes(myString);
var correctString = Encoding.UTF8.GetString(bytes);

In any case you would need to know the exact encodings that were used on the page and for reading the malformed string in the first place. Furthermore I'd generally advise explicitly against using Encoding.Default because its value isn't fixed. It's just the legacy encoding on a Windows system for use in non-Unicode applications and also gets used as the default non-Unicode text file encoding. It should have no place whatsoever in handling external resources.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow