The Sysadmin Notebook  

Sitemap

XML Schema Definition

Validating XML with XML Schemas

Contents

A schema is a document that provides a model of structure. XML Schemas is a W3C standard for describing XML documents and differ from DTDs by using XML syntax. XML Schemas are always defined in seperate files from XML data, and the data files are often referred to as XML Schemas instances. XML Schema validators should both confirm the validity of an XML Schema instance, and provide a PSV1 to the application. The PSV1 provides all the information in the XML document and a basic summary of the schema. A list of XML Schema tools is available at W3C Schema Tools

XML Declaration

Top Bottom

XML Schema documents begin with an XML declaration

<?xml version="1.0" ?>

Root Element

Top Bottom

The root element is the <schema> element. The <schema> element allows you to declare namespace information. The targetNamespace attribute allows you to specify a namespace used to identify the vocabulary you are defining. The targetNamespace attribute is optional, but if included should be accompanied by a namespace declaration with a matching attribute value. In the example below, "http://www.w3.org/2001/XMLSchema" is the default namespace, and elements in the targetNamespace can be identified using the "target" prefix.

<schema xmlns="http://www.w3.org/2001/XMLSchema"
	targetNamespace="http://home.clara.net/drdsl/order"
	xmlns:target="http://home.clara.net/drdsl/order">

Alternatively, the targetNamespace can be set as the default, in which case a prefix is used for "http://www.w3.org/2001/XMLSchema" namespace:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
	targetNamespace="http://home.clara.net/drdsl/order"
	xmlns="http://home.clara.net/drdsl/order"

Element Definitions

Top Bottom

The element definitions are placed within the schema element. Each element will assign a value for the name or ref attribute, and declare the content model or set the value for the type attribute. Elements that have element content, have the content model defined within a <complexType> definition. <simpleType> are used to define text only content. Any attribute declarations are declared after the content model:

<element name="order">
  <complexType>
    <sequence>
      <element name="orderid" type="string" />
      <element name="orderdate" type="string" />
      <element name="ordertotal" type="string" />
    </sequence>
    <attribute name="status" type="string" />
  </complexType>
</element>
<element name="Phone">
  <simpleType>
    <restriction base="string">
      <enumeration value="Home" />
      <enumeration value="Business" />
      <enumeration value="Mobile" />
      <enumeration value="Alternate" />
    </restriction>
  </simpleType>
</element>

Elements can be declared as either global or local. Global declarations are declarations that are made as direct children of the <schema> element and can thus be reference anywhere beneath the <schema> element, although you will need to include the namespace prefix if defined. In the example below, the Order element is assigned the global type 'OrderType' which is defined in the 'target' namespace:

<schema xmlns="http://www.w3.org/2001/XMLSchema"
	targetNamespace="http://home.clara.net/drdsl/order"
	xmlns:target="http://home.clara.net/drdsl/order"
	elementFormDefault="qualified">
  <complexType name="OrderType">
    <sequence>
      <element name="orderid" type="string" />
      <element name="orderdate" type="string" />
      <element name="ordertotal" type="string" />
    </sequence>
    <attribute name="status" type="string" />
  </complexType>
  <element name="Order" type="target:OrderType" />
</schema>

To reuse a global element declaration, include a 'ref' attribute with the value set to the global element's name. The example below declares three global elements that are used to define the content model for the global type 'OrderType':

<schema xmlns="http://www.w3.org/2001/XMLSchema"
	targetNamespace="http://home.clara.net/drdsl/order"
	xmlns:target="http://home.clara.net/drdsl/order"
	elementFormDefault="qualified">
  <element name="orderid" type="string" />
  <element name="orderdate" type="string" />
  <element name="ordertotal" type="string" />
  <complexType name="OrderType">
    <sequence>
      <element ref="target:orderid" />
      <element ref="target:orderdate" />
      <element ref="target:ordertotal" />
    </sequence>
    <attribute name="status" type="string" />
  </complexType>
  <element name="Order" type="target:OrderType" />
</schema>

Any global element in an XSD can be used as a root element in an instance document. Local elements can have their cardinality defined by setting values for the 'minOccurs' and 'maxOccurs' attributes, which can be any integer value or 'unbounded'. The default value for minOccurs and maxOccurs is '1'. Default values for an element can be specified by setting the 'default' attribute value. Elements can also be given a constant value by setting the value for the 'fixed' attribute

Content model definitions may include an <any> declaration to specify any element can occur at that location. The <any> declaration can further be qualified by setting the namespace attribute to:

complexType definitions can be used to create mixed content models by setting the 'mixed' attribute to 'true'. This allows the element content to be a mixture of elements and text:

<element name="para">
  <complexType mixed="true">
    <choice minOccurs="0" maxOccurs="unbounded">
      <element name="br" type="string" />
      <element name="em" type="string" />
      <element name="i" type="string" />
    </choice>
  </complexType>
</element>

Empty content models can be also be specified using the complexType:

<element name="para">
  <complexType />
</element>

Content Models

Top Bottom

Content models can be declared as:

Sequences definitions specify the elements that may occur and the order in which they occur. Sequences may contain elements, element wildcards, other sequences, choices or group references.

Choice definitions specify a choice of elements that may occur.

A <group> declaration allows you to define reusable groups of elements.Group references allow you to refer to global group elements.

The <all> declaration enables you to declare the elements in the content model may appear in any order. With <all> declarations, only element content is allowed, no other content model is allowed and each element may occur only once in instances of the defined element.

Attribute Definitions

Top Bottom

Attributes can be declared both globally or locally, and are restricted to simple types - text only content. The 'use' attribute can have a value of: required; optional; or prohibited. The default value for 'use' is optional. Default and fixed values can be also be set as for element declarations. Attribute wildcards are declared with the <anyAttribute> declaration, which must appear in either a <complexType> or <attributeGroup> declaration. To declare an element to have text-only content and attributes, use a <simpleContent> declaration within a <complexType> declaration. An <extension> declaration is needed to indicate that you are extending the simpleType by adding attribute declarations:

<element name="bookid">
  <complexType>
    <simpleContent>
      <extension base="string">
        <attribute name="edition" type="string" default="first" />
      </extension>
    </simpleContent>
  </complexType>
</element>

Data Types

Top Bottom

For text-only elements a number of built-in datatypes can be specified, or you can define custom datatypes. The built-in datatypes are:

Custom types can be created by placing an list in a <simpleType> declaration. This can be done using a list of enumeration declarations:

<attribute name="language">
  <simpleType>
    <restriction base="string">
      <enumeration value="Perl" />
      <enumeration value="Python" />
      <enumeration value="PHP" />
      <enumeration value="Ruby" />
      <enumeration value="C" />
      <enumeration value="Java" />
      <enumeration value="C++" />
      <enumeration value="C#" />
      <enumeration value="Basic" />
      <enumeration value="Fortran" />
      <enumeration value="Pascal" />
    </restriction>
  </simpleType>
</attribute>

The 'language' attribute above can only appear once in an element. To use the attribute more than once use a <list> declaration with the listType attribute set to the value of an another enumeration:

<simpleType name="language">
  <restriction base="string">
    <enumeration value="Perl" />
    <enumeration value="Python" />
    <enumeration value="PHP" />
    <enumeration value="Ruby" />
    <enumeration value="C" />
    <enumeration value="Java" />
    <enumeration value="C++" />
    <enumeration value="C#" />
    <enumeration value="Basic" />
    <enumeration value="Fortran" />
    <enumeration value="Pascal" />
  </restriction>
</simpleType>
<simpleType name="languagesUsed">
  <list itemType="language" />
</simpleType>

You can combine multiple derived types using a <union> declarations, and setting the 'memberTypes' attribute to a whitespace seperated list of names references to global <simpleType> definitions or built-in datatypes.

Modular Schemas

Top Bottom

XML Schema definitions can be stored in multiple documents and combined using either an <import> or an <include> declaration. The <import> declaration allows you to import global declarations from another XSD in a different targetNamespace. <include> declarations are normally used to combine XSDs with the same targetNamespace (or no targetNamespace).