Question

I'm trying to canonicalize the representation of some XML data by sorting each element's attributes by name (not value). The idea is to keep textual differences minimal when attributes are added or removed and to prevent different editors from introducing equivalent variants. These XML files are under source control and developers are wanting to diff the changes without resorting to specialized XML tools.

I was surprised to not find an XSL example of how to this. Basically I want just the identity transform with sorted attributes. I came up with the following with seems to work in all my test cases:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
  <xsl:template match="*|/|text()|comment()|processing-instruction()">
    <xsl:copy>
    <xsl:for-each select="@*">
        <xsl:sort select="name(.)"/>
        <xsl:copy/>
      </xsl:for-each>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

As a total XSL n00b I would appreciate any comments on style or efficiency. I thought it might be helpful to post it here since it seems to be at least not a common example.

Was it helpful?

Solution

With xslt being a functional language doing a for-each might often be the easiest path for us humans but not the most efficient for XSLT processors since they cannot fully optimize the call.

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
  <xsl:template match="*">
    <xsl:copy>
      <xsl:apply-templates select="@*">
        <xsl:sort select="name()"/>
      </xsl:apply-templates>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="@*|comment()|processing-instruction()">
    <xsl:copy />     
  </xsl:template>
</xsl:stylesheet>

This is totally trivial in this regards though and as a "XSL n00b" i think you solved the problem very well indeed.

OTHER TIPS

Well done for solving the problem. As I assume you know the order or attributes is unimportant for XML parsers so the primary benefit of this exercise is for humans - a machine will re-order them on input or output in unpredictable ways.

Canonicalization in XML is not trivial and you would be well advised to use the canonicalizer provided with any reasonable XML toolkit rather than writing your own.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top