The Sysadmin Notebook  

Sitemap

XPath Version 1.0 Notes

Using XPath 1.0 to Select XML Data

Contents

The current specification can be found at XPath 2.0. Specifications for XPath 2.0 functions can be found at XPath 2.0 functions

XPath is the XML Path Language used to select parts of an XML instance document for processing by various other XML technologies such as XSLT, XPointer, XQuery and XForms. XPath provides a language for modelling XML instance documents as a set of hierarchical nodes with axes used to navigate from one node to another. A number of functions are available for processing contents of nodes and predicates can be used to further filter node identifiers.

<!-- A single step location path -->
child::Phone[@type="Home"]
<!-- A two step location path -->
child::Person/child::Phone[@type="Home"]

In XPath, the context is defined as the location, size and position of the node currently being processed. Relative XPath locations are specified in relation to the current context

XPath Nodes

Top Bottom

In XPath version 1.0 there are seven types of nodes:

Root Node
An XML document has one and only one root element sometimes called the document element. The root element is a child of the root node. The root node may have other child elements consisting of comment nodes or processing instruction nodes. In XPath, the root node represents the document itself. The text value of the root node is the concatenation of all text values in descendant nodes.
Element Node
Each element in an XML instance document is represented as an element node. The name for each element node consists of a namespace URI or the namespace prefix and the localpart of its name, seperated with a colon. The text content of an element node is the concatenation of all text values in descendant nodes.
Attribute Node
Each element attribute is represented as an attribute node. Although the element node to which it belongs is the parent node, the attribute node is not a child node of the element. Attribute nodes can not therefore be accessed via the 'child' axis of its parent element, but instead using the 'attribute' axis. The parent axis can however be used to access the parent element from an attribute node.
Text Node
The text content of an element node is represented as a text node.
Namespace Node
All in-scope namespaces of a node are represented as namespace nodes. The name() function returns the namespace prefix associated with a node. The 'self::node()' expression (or '.') returns the namespace URI.
Comment Node
Comment nodes represent comments in the XPath data model
Processing Instruction Node
Processing instruction nodes represent comments in the XPath data model

XPath Axes

Top Bottom

XPath axes are used to navigate the node tree of the XPath data model. There are 13 axes available in XPath version 1.0:

child axis
The default axis in XPath. Selects immeadiate child nodes of the context node. Because child is the default axis, location paths can be expressed as either 'child::itemname' or simply 'itemname'. 'child::*' or '*' returns all child nodes with a name (that is elements only) of the current context node. To select all nodes use 'child::node()' or simply 'node()'. To select text node children only use 'child::text()' or 'text()'.
attribute axis
Selects attribute nodes associated with an element node. 'attribute::*' can be abbreviated to '@*'. To select a specific attribute only use 'attribute::attname' or '@attname'
ancestor axis
Recursively selects all parent nodes for the current context node up to and including the root node.
ancestor-or-self axis
Returns all ancestor nodes plus the context node.
descendant axis
Recursively returns the child nodes of the current context node.
descendant-or-self axis
Returns all descendant nodes plus the current context node.
following axis
Returns all nodes that come after the context node in document order, but excludes descendant, attribute and namespace nodes associated with the context node.
following-sibling axis
Returns all following nodes that share the same parent as the context node.
namespace axis
Returns all in-scope namespace nodes for context node.
parent axis
Returns the parent node for the context node.
preceding axis
Returns all nodes that come before the context node in document order, excluding ancestor, attribute and namespace nodes.
preceding-sibling axis
Returns all preceding nodes that share the same parent as the context node.
self axis
Returns the context node. Can be specified as 'self::node()' or simply '.'.

Functions

Top Bottom

A built-in function library exists as part of the XPath specification that can be used with predicates to add further filtering to an XPath expression.

Boolean Functions

Top Bottom
boolean()
Tests argument and returns true or false
false()
Returns false
lang()
Returns true if context node language matches string argument
not()
Returns opposite boolean value of its argument
true()
Returns true

Node-Set Functions

Top Bottom
count()
Returns number of nodes in node-set
id()
Returns node-set of nodes with id attribute equal to its argument
last()
Returns context size
local-name()
Returns localpart of the name of the node set argument, or of the context node if no argument given
name()
Returns name of element in prefix::localpart format
namespace-uri()
Returns namespace URI for node-set argument, or for context node if no argument provided
position()
Returns value equal to context position

Numeric Functions

Top Bottom
ceiling()
Returns smallest integer value greater than numeric argument
floor()
Returns smallest integer value less than numberic argument
number()
Returns numberic value of argument
round()
Rounds its argument
sum()
Returns sum of its node-set argument's values

String Functions

Top Bottom
concat()
Returns concatenation of string arguments
contains()
Returns true if first string argument contains the second string argument
normalize-space()
Strips leading and trailing space and replaces consecutive whitespace with a single space character
starts-with()
Returns true if first arguement string starts with second arguement string
string()
Returns string value of argument
string-length()
Returns length of string argument
substring()
Returns a string from the first argument beginning at a number specified in second argument and optionally ending at number specified in third argument
substring-after()
Returns string from first argument that occurs after second string argument
substring-before()
Returns string from first argument that occurs before second string argument
translate()
Returns first string argument, with characters from second argument translated to corresponding characters in third argument