XML Schema Definition
Validating XML with XML Schemas
Contents
A schema is a document that provides a model of structure. XML Schemas is a W3C standard for describing XML documents and differ from DTDs by using XML syntax. XML Schemas are always defined in seperate files from XML data, and the data files are often referred to as XML Schemas instances. XML Schema validators should both confirm the validity of an XML Schema instance, and provide a PSV1 to the application. The PSV1 provides all the information in the XML document and a basic summary of the schema. A list of XML Schema tools is available at W3C Schema Tools
XML Declaration
Top BottomXML Schema documents begin with an XML declaration
<?xml version="1.0" ?>
Root Element
Top BottomThe root element is the <schema> element. The <schema> element allows you to declare namespace information. The targetNamespace attribute allows you to specify a namespace used to identify the vocabulary you are defining. The targetNamespace attribute is optional, but if included should be accompanied by a namespace declaration with a matching attribute value. In the example below, "http://www.w3.org/2001/XMLSchema" is the default namespace, and elements in the targetNamespace can be identified using the "target" prefix.
<schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://home.clara.net/drdsl/order" xmlns:target="http://home.clara.net/drdsl/order">
Alternatively, the targetNamespace can be set as the default, in which case a prefix is used for "http://www.w3.org/2001/XMLSchema" namespace:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://home.clara.net/drdsl/order" xmlns="http://home.clara.net/drdsl/order"
Element Definitions
Top BottomThe element definitions are placed within the schema element. Each element will assign a value for the name or ref attribute, and declare the content model or set the value for the type attribute. Elements that have element content, have the content model defined within a <complexType> definition. <simpleType> are used to define text only content. Any attribute declarations are declared after the content model:
<element name="order">
<complexType>
<sequence>
<element name="orderid" type="string" />
<element name="orderdate" type="string" />
<element name="ordertotal" type="string" />
</sequence>
<attribute name="status" type="string" />
</complexType>
</element>
<element name="Phone">
<simpleType>
<restriction base="string">
<enumeration value="Home" />
<enumeration value="Business" />
<enumeration value="Mobile" />
<enumeration value="Alternate" />
</restriction>
</simpleType>
</element>
Elements can be declared as either global or local. Global declarations are declarations that are made as direct children of the <schema> element and can thus be reference anywhere beneath the <schema> element, although you will need to include the namespace prefix if defined. In the example below, the Order element is assigned the global type 'OrderType' which is defined in the 'target' namespace:
<schema xmlns="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://home.clara.net/drdsl/order"
xmlns:target="http://home.clara.net/drdsl/order"
elementFormDefault="qualified">
<complexType name="OrderType">
<sequence>
<element name="orderid" type="string" />
<element name="orderdate" type="string" />
<element name="ordertotal" type="string" />
</sequence>
<attribute name="status" type="string" />
</complexType>
<element name="Order" type="target:OrderType" />
</schema>
To reuse a global element declaration, include a 'ref' attribute with the value set to the global element's name. The example below declares three global elements that are used to define the content model for the global type 'OrderType':
<schema xmlns="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://home.clara.net/drdsl/order"
xmlns:target="http://home.clara.net/drdsl/order"
elementFormDefault="qualified">
<element name="orderid" type="string" />
<element name="orderdate" type="string" />
<element name="ordertotal" type="string" />
<complexType name="OrderType">
<sequence>
<element ref="target:orderid" />
<element ref="target:orderdate" />
<element ref="target:ordertotal" />
</sequence>
<attribute name="status" type="string" />
</complexType>
<element name="Order" type="target:OrderType" />
</schema>
Any global element in an XSD can be used as a root element in an instance document. Local elements can have their cardinality defined by setting values for the 'minOccurs' and 'maxOccurs' attributes, which can be any integer value or 'unbounded'. The default value for minOccurs and maxOccurs is '1'. Default values for an element can be specified by setting the 'default' attribute value. Elements can also be given a constant value by setting the value for the 'fixed' attribute
Content model definitions may include an <any> declaration to specify any element can occur at that location. The <any> declaration can further be qualified by setting the namespace attribute to:
- ##any - allows elements from all namespaces
- ##other - allows elements from namespaces other than the targetNamespace
- ##targetNamespace - allows elements from the targetNamespace only
- ##local - allows any non-qualified elements
- Any whitespace seperated list of namespace URIs
complexType definitions can be used to create mixed content models by setting the 'mixed' attribute to 'true'. This allows the element content to be a mixture of elements and text:
<element name="para">
<complexType mixed="true">
<choice minOccurs="0" maxOccurs="unbounded">
<element name="br" type="string" />
<element name="em" type="string" />
<element name="i" type="string" />
</choice>
</complexType>
</element>
Empty content models can be also be specified using the complexType:
<element name="para"> <complexType /> </element>
Content Models
Top BottomContent models can be declared as:
- A <sequence>
- A <choice>
- A reference to a global <group>
- An <all> declaration
Sequences definitions specify the elements that may occur and the order in which they occur. Sequences may contain elements, element wildcards, other sequences, choices or group references.
Choice definitions specify a choice of elements that may occur.
A <group> declaration allows you to define reusable groups of elements.Group references allow you to refer to global group elements.
The <all> declaration enables you to declare the elements in the content model may appear in any order. With <all> declarations, only element content is allowed, no other content model is allowed and each element may occur only once in instances of the defined element.
Attribute Definitions
Top BottomAttributes can be declared both globally or locally, and are restricted to simple types - text only content. The 'use' attribute can have a value of: required; optional; or prohibited. The default value for 'use' is optional. Default and fixed values can be also be set as for element declarations. Attribute wildcards are declared with the <anyAttribute> declaration, which must appear in either a <complexType> or <attributeGroup> declaration. To declare an element to have text-only content and attributes, use a <simpleContent> declaration within a <complexType> declaration. An <extension> declaration is needed to indicate that you are extending the simpleType by adding attribute declarations:
<element name="bookid">
<complexType>
<simpleContent>
<extension base="string">
<attribute name="edition" type="string" default="first" />
</extension>
</simpleContent>
</complexType>
</element>Data Types
Top BottomFor text-only elements a number of built-in datatypes can be specified, or you can define custom datatypes. The built-in datatypes are:
- string
- normalizedString - contiguous whitespace is converted to a single space
- token - a string that does not contain more two consecutive whitespace characters
- byte: -128 to 127
- unsignedByte: 0 to 255
- base64Binary: Base64 encoded binary
- hexBinary: hexadecimal encoded binary
- integer
- positiveInteger: integer greater then 0
- negativeInteger: integer less than 0
- nonNegativeInteger
- nonPositiveInteger
- int: 32bit
- unsignedInt
- long: 64bit
- unsignedLong
- short: 16bit
- unsignedShort
- decimal: may or may not include fractional part
- float
- double
- boolean
- time: see ISO 8601
- dateTime
- duration
- date
- gMonth
- gYear
- gYearMonth
- gDay
- gMonthDay
- name
- QName
- NCName
- anyURI: a valid URI
- language: as defined in RFC1766
- ID
- IDREF(S)
- ENTITY or ENTITIES
- NOTATION
- NMTOKEN(S)
Custom types can be created by placing an list in a <simpleType> declaration. This can be done using a list of enumeration declarations:
<attribute name="language">
<simpleType>
<restriction base="string">
<enumeration value="Perl" />
<enumeration value="Python" />
<enumeration value="PHP" />
<enumeration value="Ruby" />
<enumeration value="C" />
<enumeration value="Java" />
<enumeration value="C++" />
<enumeration value="C#" />
<enumeration value="Basic" />
<enumeration value="Fortran" />
<enumeration value="Pascal" />
</restriction>
</simpleType>
</attribute>
The 'language' attribute above can only appear once in an element. To use the attribute more than once use a <list> declaration with the listType attribute set to the value of an another enumeration:
<simpleType name="language">
<restriction base="string">
<enumeration value="Perl" />
<enumeration value="Python" />
<enumeration value="PHP" />
<enumeration value="Ruby" />
<enumeration value="C" />
<enumeration value="Java" />
<enumeration value="C++" />
<enumeration value="C#" />
<enumeration value="Basic" />
<enumeration value="Fortran" />
<enumeration value="Pascal" />
</restriction>
</simpleType>
<simpleType name="languagesUsed">
<list itemType="language" />
</simpleType>
You can combine multiple derived types using a <union> declarations, and setting the 'memberTypes' attribute to a whitespace seperated list of names references to global <simpleType> definitions or built-in datatypes.
Modular Schemas
Top BottomXML Schema definitions can be stored in multiple documents and combined using either an <import> or an <include> declaration. The <import> declaration allows you to import global declarations from another XSD in a different targetNamespace. <include> declarations are normally used to combine XSDs with the same targetNamespace (or no targetNamespace).
