Question

I am using Xerces and am attempting to read an XML document containing binary data into a DOM:

<field1>
<data>
[binary data (multiline) here]
</data>
</field1>

I'm then retrieving the content of each <data> node as a string for pre-processing. The code for reading is as follows:

DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(xc_DOMImplementation_Name);
DOMLSSerializer* serializer = ((DOMImplementationLS*)impl)->createLSSerializer();
std::wstring ws(serializer->writeToString(node));

This returns <data></data> without any content. I've also tried using a CDATA block but that didn't help. Swapping the binary data out for multi-line ASCII seems to work fine. I would expect the string to truncate as soon as the first binary character is encountered (probably causing the empty tags to be returned), but surprisingly removing null characters also didn't work and <data></data> was still returned.

How can I do this in Xerces? I want to avoid pre-processing the entire document by reading into an unsigned char* and performing the manipulation there.

Thanks.

Was it helpful?

Solution

You'd better to base64 encode the binary data. Check commons-codec library on apache.

XML is made for text, so you need to transform binary data into text; base64 codec serves this purpose.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top