質問

I have been trying to solve this one for hours now but got no luck. The XML looks like-

    <description>
     Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
    sed diam nonumy eirmod tempor labore et dolore magna aliquyam erat

     &lt;p&gt;&lt;b&gt;Section B: China&lt;/b&gt;&lt;/p&gt;

     &lt;p&gt;Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
     sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam
     eratLorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
     eirmod tempor invidunt ut labore et dolore magna aliquyam erat&lt;/p&gt;

      &lt;p&gt;&lt;b&gt;Section C: Himalayan Studies&lt;/b&gt;&lt;/p&gt;

     &lt;p&gt;Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
     sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam
     eratLorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam
     nonumy eirmod tempor invidunt ut labore a aliquyam erat&lt;/p&gt;

     </description>

I want the output to be clean without the encoded <p> or <b> tags but also insert a line break before the sections by replacing &lt;p&gt;&lt;b&gt; with <br/> . So output will look like

<description>
       Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
       sed diam nonumy eirmod tempor labore et dolore magna aliquyam erat

       <br/>Section B: Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
       sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam
       eratLorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam no
       eirmod tempor invidunt ut labore et dolore magna aliquyam erat

        <br/>Section C: Himalayan Studies Lorem ipsum dolor sit amet, consetetur 

       sadipscing sed diam nonumy eirmod tempor invidunt ut labore et dolore m   
       ipsum dolor sit amet, consetetur sadipscing elitr, sed diam
       nonumy eirmod tempor invidunt ut labore a aliquyam erat

         </description>

I have tried using the replace function but was not able to add line breaks. Also tried using translate but no luck

<xsl:value-of select="translate(.,
            translate(.,
            'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ',
            ''),
            '')"/>

Any help on how to approach this problem will be appreciated.

役に立ちましたか?

解決

An XSLT 3.0 solution that uses the parse-xml() function:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
    <!--standard identity template-->
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="description">
        <xsl:copy>
            <!--Concatenate encoded <p> element to ensure that it is well-formed 
                XML with a document element when parsed.
                Use parse-xml() to parse the encoded markup as a parsed document.
                Apply-templates to the parsed document--> 
            <xsl:apply-templates select="parse-xml(concat('&lt;p&gt;', ., '&lt;/p&gt;'))"/>
        </xsl:copy>
    </xsl:template>

    <!-- remove <p> and <b> elements -->
    <xsl:template match="p | b">
        <xsl:apply-templates/>
    </xsl:template>

    <!--for every <p> element that has a <b> element, generate a <br/> -->
    <xsl:template match="p[b]">
        <br/>
        <xsl:apply-templates/>
    </xsl:template>
</xsl:stylesheet>

他のヒント

An inelegant (but working) solution:

<xsl:value-of select="replace(replace(replace(., 
                     '&lt;p&gt;&lt;b&gt;', '¶'), 
                     '(&lt;)(.*)(&gt;)', ''), 
                     '¶', '&lt;br/&gt;')" 
              disable-output-escaping="yes"/>

An alternative (which is way uglier)

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="description/text()">
        <description>
            <xsl:analyze-string select="." regex="&lt;p&gt;&lt;b&gt;">
                <xsl:matching-substring>
                    <br/>
                </xsl:matching-substring>
                <xsl:non-matching-substring>
                    <xsl:analyze-string select="." regex="&lt;/b&gt;&lt;/p&gt;">
                        <xsl:matching-substring/>
                        <xsl:non-matching-substring>
                            <xsl:analyze-string select="." regex="&lt;p&gt;">
                                <xsl:matching-substring/>
                                <xsl:non-matching-substring>
                                    <xsl:analyze-string select="." regex="&lt;/p&gt;">
                                        <xsl:matching-substring/>
                                        <xsl:non-matching-substring>
                                            <xsl:value-of select="."/>
                                        </xsl:non-matching-substring>
                                    </xsl:analyze-string>
                                </xsl:non-matching-substring>
                            </xsl:analyze-string>
                        </xsl:non-matching-substring>
                    </xsl:analyze-string>
                </xsl:non-matching-substring>
            </xsl:analyze-string>
        </description>
    </xsl:template>
</xsl:stylesheet>

An XSLT 2.0 solution that uses the tokenize() function to split the encoded HTML where &lt;p&gt;&lt;b&gt; occurs. For each of the tokenized items, it creates the <br/> element (if it's not the first item in the sequence) and removes any of the remaining encoded markup from that item with the replace() function.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="description">
        <xsl:copy>
            <xsl:for-each select="tokenize(., '&lt;p&gt;&lt;b&gt;')">
                <xsl:if test="position()>1">
                    <br/>
                </xsl:if>
                <xsl:sequence select="replace(., '&lt;.*?&gt;', '')"/>
            </xsl:for-each>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top