From C++ wchar_t to C# char via socket
-
09-09-2019 - |
Question
I am currently building a C++ application that communicate via socket to a C# application. My C++ app sends wchar_t* via socket.
Here is an overview of what is send :
<!-- Normal xml file--
Here is what I receive on the other side (I do a stream.read to a byte array and use UTF8Encoding.GetString() to convert the byte array to a readable string)
<\0!\0-\0-\0 \0N\0o\0r\0m\0a\0l\0 \0x\0m\0l\0 \0f\0i\0l\0e\0-\0-
Is it a marshalling problem? What do you say? Why is it 0 extended and why unicode caracter doesn't appear on the C# side?
Solution
Looks like it's sending UTF-16, not UTF-8, which makes sense - wchar_t
is basically a 16-bit type (in Windows), and you're sending it down "raw" as far as I can tell. I suggest that if you're going to convert the data into an XDocument
or XmlDocument
, you do it with the binary data - the framework knows how to autodetect UTF-16 for XML files (IIRC).
You'll potentially have problems if the XML declaration declares it to be UTF-8 when it's really UTF-16 though.
Alternatively, use suitable encoding classes on the C++ side to genuinely send UTF-8. This would take extra processing time, but usually save bandwidth if that's a consideration.