Question

I'm writing a small application in Java that uses XOM to output XHTML.

The problem is that XOM places the following tag before all the html:

<?xml version="1.0" encoding="UTF-8"?>

I've read their documentation, but I can't seem to find how to remove this tag. Thanks guys.

Edit: I'm outputting to a file using XOM's Serializer class

Follow up: If it is good practice to use the XML tag before the DOCTYPE, why don't any websites use it? Also, why does the W3C validator give me and error when it sees the XML tag? Here is the error:

Illegal processing instruction target (found xml)

Finally, if I were to put the XML tag before my DOCTYPE, does this mean I don't have to specify <meta charset="UTF-8" /> in my html header?

Was it helpful?

Solution

The tag is valid as XML and XHTML, and good practice. There should be no reason to remove it.

Just leave it there ... or fix whatever it is that is expecting it not to be there.


If you don't believe me, take a look at this excerpt from the XHTML 1.1 spec.

"Example of an XHTML 1.1 document

 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
     "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
 <html version="-//W3C//DTD XHTML 1.1//EN"
       xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.w3.org/1999/xhtml
                      http://www.w3.org/MarkUp/SCHEMA/xhtml11.xsd"
 >
   <head>
     <title>Virtual Library</title>
   </head>
   <body>
     <p>Moved to <a href="http://example.org/">example.org</a>.</p>
   </body>
 </html>

Note that in this example, the XML declaration is included. An XML declaration like the one above is not required in all XML documents. XHTML document authors SHOULD use XML declarations in all their documents. XHTML document authors MUST use an XML declaration when the character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding is specified by a higher-level protocol."


By the way, the W3C validation service says that is OK ... but if there is any whitespace before the <?xml ...?> tag it complains.

OTHER TIPS

Does this work? This is listed in the Javadoc

protected void writeXMLDeclaration() throws IOException

You could override it, and do nothing.....

Agreed you should normally output the prologue

Assuming you wish to serve your XHTML as text/html content type, you are right to want to remove the XML declaration, because if you don't, it will throw IE6 into quirks mode.

Overriding writeXMLDeclaration() as suggested by MJB looks like a good way to do it.

But you should be aware that you may well hit other problems using an XML serializer and serving the output as text/html.

Most likely, is that the output will produce a tag like this: <script src="myscript.js" />. Browsers (except Safari) won't treat that as a script self closing tag, but as as a script start tag, and everything that follows will be treated as part of the script and not rendered by the browser.

You will probably need to override your serializer to make it HTML aware to resolve this. I suggest overriding the writeEmptyElementTag() function, and for all elements with names not in the list "area", "base", "basefont", "bgsound", "br", "col", "command", "embed", "frame", "hr", "isindex", "image", "img", "input", "keygen", "link", "meta", "param", "source", "spacer" and "wbr", call writeStartTag() and then writeEndTag() instead of the default behaviour.

Finally, if I were to put the XML tag before my DOCTYPE, does this mean I don't have to specify <meta charset="UTF-8" /> in my html header?

No it doesn't. When served as text/html, the XML declaration is simply ignored by browsers, so you will still need to provide the character encoding by some other means, either the meta tag, or in the HTTP headers.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top