The Sysadmin Notebook  

Sitemap

XQuery

Processing XML with XQuery

Contents

XQuery is a W3C standard XML Query language, designed to retrieve and process large volumes of data stored in XML format. With large data files, XSLT would not be appropriate, as XSLT must load the source document into memory. XQuery uses XPath 2.0 to select nodes. XQuery expressions are enclosed with curly braces. The saxon8q command can be used to process XQuery templates.

The examples in this section use the following XML data file as input:

<?xml version="1.0" ?>
<commands>
  <command type="2">
    <name>ls</name>
    <description>list directory contents</description>
  </command>
  <command type="2">
    <name>ps</name>
    <description>report a snapshot of current process</description>
  </command>
  <command type="2">
    <name>df</name>
    <description>report file system disk space usage</description>
  </command>
  <command type="1">
    <name>ipconfig</name>
    <description>report a network interface configuration</description>
  </command>
  <command type="2">
    <name>ifconfig</name>
    <description>configure a network interface</description>
  </command>
</commands>

Specifying Input

Top Bottom

XQuery has two main functions to identify the location of input data for processing: doc() and collection(). With doc(), you provide a filename within the brackets, and appended a suitable XPath location. Literal result elements can be included in the XQuery template.

xquery version "1.0";
<commands>
{doc("commands.xml")/commands/command}
</commands>

FLWOR Expressions

Top Bottom

Much of XQuery's querying power is based on FLWOR:

The for expression has the syntax 'for...in...to...':

<commands>
{ for $i in (1,2,3,4,5) return <command>{$i}</command> }
</commands>

The same construct can be used with nodes in place of a value sequence:

xquery version "1.0";
<commands>
{ for $i in (doc("commands.xml")/commands/command) 
	return $i }
</commands>

which produces:

<?xml version="1.0" encoding="UTF-8"?>
<commands>
   <command type="2">
      <name>ls</name>
      <description>list directory contents</description>
  </command>
   <command type="2">
      <name>ps</name>
      <description>report a snapshot of current process</description>
  </command>
   <command type="2">
      <name>df</name>
      <description>report file system disk space usage</description>
  </command>
   <command type="1">
      <name>ipconfig</name>
      <description>report a network interface configuration</description>
  </command>
   <command type="2">
      <name>ifconfig</name>
      <description>configure a network interface</description>
  </command>
</commands>

Where clauses can be used to filter the output:

xquery version "1.0";
<commands>
{ for $command in (doc("commands.xml")/commands/command) 
	where $command/@type = '2'
	return element command {
		attribute name {$command/name/text()},
		element desc {$command/description/text()},
		element type {"Linux"}
	}
}
</commands>

which produces:

<?xml version="1.0" encoding="UTF-8"?>
<commands>
   <command name="ls">
      <desc>list directory contents</desc>
      <type>Linux</type>
   </command>
   <command name="ps">
      <desc>report a snapshot of current process</desc>
      <type>Linux</type>
   </command>
   <command name="df">
      <desc>report file system disk space usage</desc>
      <type>Linux</type>
   </command>
   <command name="ifconfig">
      <desc>configure a network interface</desc>
      <type>Linux</type>
   </command>
</commands>

Order by clauses are used for sorting:

xquery version "1.0";
<commands>
{ for $command in (doc("commands.xml")/commands/command) 
	let $name := $command/name/text()
	order by $name descending
	return element command {
		attribute name {$name},
		element desc {$command/description/text()}
	}
}
</commands>

which produces:

<?xml version="1.0" encoding="UTF-8"?>
<commands>
   <command name="ps">
      <desc>report a snapshot of current process</desc>
   </command>
   <command name="ls">
      <desc>list directory contents</desc>
   </command>
   <command name="ipconfig">
      <desc>report a network interface configuration</desc>
   </command>
   <command name="ifconfig">
      <desc>configure a network interface</desc>
   </command>
   <command name="df">
      <desc>report file system disk space usage</desc>
   </command>
</commands>

FLOWR statements allow you to iterate over an array. For conditional processing use 'if...then...else':

<commands>
{for $command in (doc("commands.xml")/commands/command)
  return if ($command/@type = 2)
    then element linuxcommand {$command/name/text()}
    else element windowscommand {$command/name/text()}
}
</commands>

producing:

<?xml version="1.0" encoding="UTF-8"?>
<commands>
   <linuxcommand>ls</linuxcommand>
   <linuxcommand>ps</linuxcommand>
   <linuxcommand>df</linuxcommand>
   <windowscommand>ipconfig</windowscommand>
   <linuxcommand>ifconfig</linuxcommand>
</commands>