The Sysadmin Notebook  

Sitemap

Document Type Definition

Validating XML with DTDs

Contents

A DTD allows you to define a valid vocabulary for an XML document. The XML document can then be validated against the DTD. DTDs should be placed on the line immeadiately following the XML declaration and take the form:

<!DOCTYPE root-element-name [ ]>

Between the square brackets, you can place declarations for the elements, attributes and entities defined by the DTD: this is called internal subset declaration. Alternatively, the declarations can be placed in an external file, which can be identified using either a 'System identifier' or a 'Public identifier'. System identifiers use the keyword 'SYSTEM' followed by a URI reference to the documents location. Public identifiers use the keyword 'PUBLIC' and followed by an identifier to an entry in a catalogue. Typically FPI's are used. FPI's take the form: ~//Owner//Class Description//Language//Version. An optional SYSTEM identifier can be added at the end of a PUBLIC identifier, to provide an alternate location for the file: in which case the 'SYSTEM' keyword can be omitted. Example DOCTYPE declarations:

<!DOCTYPE order SYSTEM "order.dtd" [ ] >
<!DOCTYPE order SYSTEM "file:///home/dr00/myweb/dtds/order.dtd" [ ]>
<!DOCTYPE order SYSTEM "http://home.clara.net/drdsl/dtds/order.dtd" [ ] >
<!DOCTYPE order PUBLIC "~//DRDSL//orders//EN//1">
/* PUBLIC Declaration with optional SYSTEM declaration */
<!DOCTYPE order PUBLIC "~//DRDSL//orders//EN//1" "order.dtd">
<!DOCTYPE html
	PUBLIC "~//W3C//DTD XHTML 1.0 Strict//EN"
	"http://www.w3c.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!DOCTYPE HTML PUBLIC "~//W3C/DTD HTML 4.01//EN"
 	"http://www.w3.org/TR/html4/strict.dtd">

DTD declarations

Top Bottom

DTD declarations consist of three parts:

Element Declarations

Top Bottom

The element declaration begins: <!ELEMENT followed by the element name and a definition of the content (content model). The content model may contain element children, text, a combination of both or it may be empty. For element content, the content can be specified as a sequence or option group or both:

<!ELEMENT order (elem1, elem2, elem3)>
<!ELEMENT order (elem1 | elem2 | elem3)>
<!ELEMENT order (elem1 | (elem2, elem3))>
<!ELEMENT order (elem1, (elem2 | elem3))>

Text content is specified as #PCDATA, and mixed content is specified by combining #PCDATA and element content as choices in the content model. The #PCDATA content must be listed first:

<!ELEMENT order (#PCDATA)>
<!ELEMENT order (#PCDATA | elem1| elem2 | elem3)*>

With mixed content you must also include a 'cardinality indicator' of 'zero or more' (*) after the closing bracket of the content model. Other cardinality indicators are 'zero or once' (?), 'once or more (+). If no cardinality indicator is given, this indicates 'once and only once'

The content model can also be specified as 'empty' (no content) or 'any' (any content from the DTD can appear in any order):

<!ELEMENT br EMPTY>
<!ELEMENT comment ANY>

Attribute Declarations

Top Bottom

Attribute declarations allow you to declare a list of allowed attributes for an element and begin <!ATTLIST followed by the element name and the attribute definition. The attribute definition begins with the name of the attribute, followed by the type of the attribute and an attribute value declaration. Attribute types are listed below:

Type Description
CDATA unparsed character data
ID unique identifier for an element. Must obey rules for xml names.
IDREF reference by id to another pre-declared element.
IDREFS list of IDREF values
ENTITY reference to an external unparsed entity (file) which has been previously declared in the DTD.
ENTITIES list of ENTITY values
NMTOKEN name token
NMTOKENS list of NMTOKEN values
Enumerated List array of possible values, similar to an ENUM data field.

Attribute value declarations appear at the end of the attribute declaration, and can be declared as:

Entity Declarations

Top Bottom

Entities are mechanisms to refer to replacement text, XML markup or external files. There are four primary types:

Entity Type Description
Built-in There are five built-in entities in XML: &, <, >, ', ". To use these entities, begin with an & character, followed by the entity name (amp, lt, gt, apos, quot), and end with a semi-colon
Character To use unicode characters, begin with an ampersand, followed by a hash, the unicode value for the character, and finish with a semicolon. Or you can use the hexidecimal Unicode value by beginning with an ampersand, followed by a hash, an 'x', the hexadecimal value and finish with a semicolon. The unicode charts can be found at Unicode Charts
General Must be declared previously in the DTD before use. Generally used to specify reusable strings and can be declared using internal declarations, external system declarations or external public declarations:
  • <!ENTITY my-string "Built with Template Toolkit">
  • <!ENTITY my-object "filename.txt">
To refer to a general entity, simply precede the entity name with an ampersand: &my-string
Parameter Unlike General entities, Parameter entities can be made from DTD declarations, and thus can be used to refer to other DTDs, allowing modular development of DTDs:
<!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "xhtml-lat1.ent">
To refer to a Parameter entity precede the name with a percent sign

Example DTDs

Top Bottom