Group XML elements by key using XSLT v1.0
XSLT v1.0 does not include any explicit support for grouping, however it is possible to achieve the same effect through creative use of the key and generate-id functions:
- Create an index such that all elements with a given key can be retrieved quickly and efficiently.
- Iterate over all keys in the index.
- Use the index to retrieve the set of elements corresponding to each key.
The difficult part of the process is step 2 because it is not possible to extract a list of keys from the index. What you can do is construct a node set containing the same elements as those present in the index, then use the index to eliminate duplicates from that node set.
Duplicates are eliminated by designating once instance of each key as the canonical instance. The one chosen here is the first member of node set that is returned when the key is looked up in the index.
Create an index
The index is created by adding an xsl:key element to the top level of the stylesheet:
<xsl:key name="paths" match="path" use="text()">
The three attributes of this element specify that:
- The name of the index is ‘paths’.
- Elements are added to the index if they are path elements.
- The elements are indexed according to the text that they contain.
Iterate over all keys in the index
This is done using a xsl:for-each element acting upon a very particular XPath expression:
<xsl:for-each select="//path[generate-id()=generate-id(key('paths',text())[1])]">
What the expression does is to select all path elements in the document, then consider whether each one is a canonical instance or a duplicate:
- Extract the text from within the current path element.
- Use the index to identify all path elements with the same text content.
- Select the first of those path elements from the index (the canonical instance).
- Generate a string to uniquely identify that canonical instance.
- Generate a string to uniquely identify the current path element.
- Compare the two strings. If they are the same then the current path element is the canonical instance, otherwise it is a duplicate.
Retrieve the set of elements corresponding to each key
Given the value of a key, the index can be trivially used to retrieve the set of elements corresponding to that key. This can then be processed in whatever manner is needed.
Here is the complete stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
< xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
< xsl:key name="paths" match="path" use="text()"/>
< xsl:template match="log">
< log>
<xsl:for-each select="//path[generate-id()=generate-id(key('paths',text())[1])]">
<pathentry>
<path><xsl:value-of select="text()"/></path>
<xsl:for-each select="key('paths',text())">
<logentry>
<xsl:attribute name="revision"><xsl:value-of select="ancestor::logentry/@revision"/></xsl:attribute>
<xsl:attribute name="action"><xsl:value-of select="@action"/></xsl:attribute>
<xsl:attribute name="date"><xsl:value-of select="ancestor::logentry/date/text()"/></xsl:attribute>
</logentry>
</xsl:for-each>
</pathentry>
</xsl:for-each>
< /log>
< /xsl:template>
< /xsl:stylesheet>