web analytics

Group XML elements by key using XSLT

Options

codeling 1599 - 6654
@2016-12-19 23:06:55

To organise a set of XML elements into groups, such that elements with the same key are placed in the same group, and elements with different keys are placed in different groups

Suppose you have a log file, the elements in the log are grouped by revision number. Each revision affects one or more paths, and each path is affected by one or more revisions. Here is a simplified illustration of the structure:

<?xml version="1.0"?>
<log>
 <logentry revision="3">
  <date>2010-12-20T13:15:00Z</date>
  <paths>
   <path action="M">/trunk/hello.c</path>
  </paths>
 </logentry>
 <logentry revision="2">
  <date>2010-12-20T12:00:00Z</date>
  <paths>
   <path action="A">/trunk/Makefile</path>
   <path action="A">/trunk/hello.c</path>
  </paths>
 </logentry>
</log>

This particular example shows that the file /trunk/hello.c was added in revision 2 then modified in revision 3. Revision 2 also saw the addition of /trunk/Makefile, but this file was not affected by revision 3. The revisions are listed in descending numerical order (equivalent to reverse chronological order).

Suppose you wish to reorganise the log so that entries are grouped by path as opposed to revision number:

<?xml version="1.0"?>
<log>
 <pathentry>
  <path>/trunk/hello.c</path>
  <logentry revision="3" action="M" date="2010-12-20T13:15:00Z"/>
  <logentry revision="2" action="A" date="2010-12-20T12:00:00Z"/>
 </pathentry>
 <pathentry>
  <path>/trunk/Makefile</path>
  <logentry revision="2" action="A" date="2010-12-20T12:00:00Z"/>
 </pathentry>
</log>

This arrangement could provide a starting point for answering questions such as:

  • when was each path originally added to the repository, or
  • when was each path most recently modified, or
  • how many times has each path been revised.
@2016-12-19 23:08:26

Group XML elements by key using XSLT v1.0

XSLT v1.0 does not include any explicit support for grouping, however it is possible to achieve the same effect through creative use of the key and generate-id functions:

  1. Create an index such that all elements with a given key can be retrieved quickly and efficiently.
  2. Iterate over all keys in the index.
  3. Use the index to retrieve the set of elements corresponding to each key.

The difficult part of the process is step 2 because it is not possible to extract a list of keys from the index. What you can do is construct a node set containing the same elements as those present in the index, then use the index to eliminate duplicates from that node set.

Duplicates are eliminated by designating once instance of each key as the canonical instance. The one chosen here is the first member of node set that is returned when the key is looked up in the index.

Create an index

The index is created by adding an xsl:key element to the top level of the stylesheet:

<xsl:key name="paths" match="path" use="text()">

The three attributes of this element specify that:

  • The name of the index is ‘paths’.
  • Elements are added to the index if they are path elements.
  • The elements are indexed according to the text that they contain.

Iterate over all keys in the index

This is done using a xsl:for-each element acting upon a very particular XPath expression:

<xsl:for-each select="//path[generate-id()=generate-id(key('paths',text())[1])]">

What the expression does is to select all path elements in the document, then consider whether each one is a canonical instance or a duplicate:

  1. Extract the text from within the current path element.
  2. Use the index to identify all path elements with the same text content.
  3. Select the first of those path elements from the index (the canonical instance).
  4. Generate a string to uniquely identify that canonical instance.
  5. Generate a string to uniquely identify the current path element.
  6. Compare the two strings. If they are the same then the current path element is the canonical instance, otherwise it is a duplicate.

Retrieve the set of elements corresponding to each key

Given the value of a key, the index can be trivially used to retrieve the set of elements corresponding to that key. This can then be processed in whatever manner is needed.

Here is the complete stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
< xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
< xsl:key name="paths" match="path" use="text()"/>

< xsl:template match="log">
< log>
  <xsl:for-each select="//path[generate-id()=generate-id(key('paths',text())[1])]">
   <pathentry>
    <path><xsl:value-of select="text()"/></path>
    <xsl:for-each select="key('paths',text())">
     <logentry>
      <xsl:attribute name="revision"><xsl:value-of select="ancestor::logentry/@revision"/></xsl:attribute>
      <xsl:attribute name="action"><xsl:value-of select="@action"/></xsl:attribute>
      <xsl:attribute name="date"><xsl:value-of select="ancestor::logentry/date/text()"/></xsl:attribute>
     </logentry>
    </xsl:for-each>
   </pathentry>
  </xsl:for-each>
< /log>
< /xsl:template>
< /xsl:stylesheet>

 

@2016-12-19 23:10:25

Group XML elements by key using XSLT v2.0

XSLT v2.0 introduced direct support for grouping my means of the for-each-group element. This is more readable than the Muenchian method, and potentially more efficient too. The only significant drawback is that an XSLT v2.0-compatible processor is needed.

Within a for-each-group element, the functions current-grouping-key and current-group can be used to gain access to the current value of the key and the set of nodes that correspond to that key.

Here is a complete stylesheet to perform the task specified in the scenario (grouping Subversion log entries by path):

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:template match="/log">
<log>
  <xsl:for-each-group select="//path" group-by="text()">
   <pathentry>
    <path><xsl:value-of select="current-grouping-key()"/></path>
    <xsl:for-each select="current-group()">
     <logentry>
      <xsl:attribute name="revision"><xsl:value-of select="ancestor::logentry/@revision"/></xsl:attribute>
      <xsl:attribute name="action"><xsl:value-of select="@action"/></xsl:attribute>
      <xsl:attribute name="date"><xsl:value-of select="ancestor::logentry/date/text()"/></xsl:attribute>
     </logentry>
    </xsl:for-each>
   </pathentry>
  </xsl:for-each-group>
</log>
</xsl:template>
</xsl:stylesheet>

Comments

You must Sign In to comment on this topic.


© 2024 Digcode.com