The Sysadmin Notebook  

Sitemap

XML Notes

All Things XML

Contents

XML is a standard for adding structure to data. An XML parser can process an XML document and present the data to an application.

XML Tags

Top Bottom

An XML tag is text that begins with a < and ends with a >. Tags are normally paired in which case the closing tag has the same text but begins with a </. Whitespace is allowed before the closing > of a tag, but not immeadiately after the opening < of a start-tag or the opening </ of a end-tag. Empty tags, tags with no data content, can be self-closing.

Data is placed between a matching start and end tags, and is called the 'element content'. Element content can either be other tags or simply character data (PCDATA). Whitespace in PCDATA is retained, but whitespace between markup can be either extraneous whitespace or data depending on the definition of the data contents for the containing tag (if the tag is allowed to contain PCDATA, then the whitespace may be considered part of the data).

<tagname>Here's my 
	data</tagname>
<empty></empty>
<self-closing />

Tags can be nested to provide a heirarchical structure to the data and child tags must be closed before their parent tags. The top-level element is called the root element and a XML document must have one and only one root element.

Tag names are case sensitive and can begin with any letter or a hyphen but must not contain any spaces. After the first character in a tag name, numbers, hyphens, and full-stops are allowed. Tags cannot begin with the letters 'xml' in upper, lower or mixed case.

Attributes are name/value pairs associated with an element, and can only appear in start-tags after the tag name. All attributes listed must be given a value in quotation marks, single or double, even if it is just an empty string(""). Attribute names must obey the same rules as for element names, and each attribute can be used only once per element. New-line characters in attribute names are replaced by xml parsers with a single space. Attributes can be added to an element in any order.

The characters < and & cannot occur in PCDATA sections. To use these characters in PCDATA, you can use their entity references instead or specify a CDATA section:

<![CDATA[ if $data < 6 && $data > 1 ]]>

Everything inside the inner [ ] is left unprocessed by the parser.

XML Declarations

Top Bottom

An XML declaration, if used, must occur at the first character on the first line of the file. The declaration must begin with <?xml and end with ?>. The declaration must identify the version number, and can optionally also specify values for the encoding and standalone attributes, in that order

<?xml version="1.0" encoding="UTF=8" standalone="yes" ?>

The version attribute specifies which version of the XML specification the document follows. The encoding attribute indicates the character code used to represent characters in the document. The standalone attribute is used to indicate whether or not the document depends on any external files.

Namespaces

Top Bottom

A namespace allows you to localise elements: the same name can be used in different namespaces to identify different elements. A namespace is declared by associating a namespace prefix attribute to a URI in a start tag. The namespace prefix can then be attached to element names:

<ord:orders xmlns:ord="http://www.myweb.com/orders" />
  <ord:id>09BA3423</ord:id>
  <ord:item>Widget 5</ord:item>
</ord>

URI can be expressed as either a URL or a URN. URNs consist of three parts: seperated by colons:

For example a URN for a published book might be: urn:published-book:323324543.

Elements names with a namespace prefix are referred to as QNames. More than one namespace can be declared in the same element:

<ord:orders xmlns:ord="http://www.myweb.com/orders"
	    xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <ord:id>09BA3423</ord:id>
  <xhtml:p>This data is in the 'xhtml' namespace</xhtml:p>
  <ord:item>Widget 5</ord:item>
</ord>

When using declaring multiple namespaces in an element, specify a default namespace by omitting the namespace prefix:

<ord:orders xmlns="http://www.myweb.com/orders"
	    xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <id>09BA3423</id>
  <xhtml:p>This data is in the 'xhtml' namespace</xhtml:p>
  <xhtml:p>Tags without a namespace prefix are in the default 'ord' namespace</xhtml:p>
  <item>Widget 5<item>
</ord>

Namespaces declared on the root element can be used throughout the document. Namespaces declared lower in the hierarchy, can only be applied to descendant elements. Within a default namespace, you can create an element outside the namespace by specify the namespace attribute with an empty string:

<ord:orders xmlns="http://www.myweb.com/orders">
  <id>09BA3423</id>
  <comment xmlns="">This element is not in any namespace</comment>
  <comment>This element is back in the default namespace 'ord'</comment>
  <item>Widget 5<item>
</ord>

Namespace prefixes may also be assigned to attributes of elements, which would allow you to target attributes in a particular namespace for processing.