Document Type Definition
Validating XML with DTDs
Contents
A DTD allows you to define a valid vocabulary for an XML document. The XML document can then be validated against the DTD. DTDs should be placed on the line immeadiately following the XML declaration and take the form:
<!DOCTYPE root-element-name [ ]>
Between the square brackets, you can place declarations for the elements, attributes and entities defined by the DTD: this is called internal subset declaration. Alternatively, the declarations can be placed in an external file, which can be identified using either a 'System identifier' or a 'Public identifier'. System identifiers use the keyword 'SYSTEM' followed by a URI reference to the documents location. Public identifiers use the keyword 'PUBLIC' and followed by an identifier to an entry in a catalogue. Typically FPI's are used. FPI's take the form: ~//Owner//Class Description//Language//Version. An optional SYSTEM identifier can be added at the end of a PUBLIC identifier, to provide an alternate location for the file: in which case the 'SYSTEM' keyword can be omitted. Example DOCTYPE declarations:
<!DOCTYPE order SYSTEM "order.dtd" [ ] > <!DOCTYPE order SYSTEM "file:///home/dr00/myweb/dtds/order.dtd" [ ]> <!DOCTYPE order SYSTEM "http://home.clara.net/drdsl/dtds/order.dtd" [ ] > <!DOCTYPE order PUBLIC "~//DRDSL//orders//EN//1"> /* PUBLIC Declaration with optional SYSTEM declaration */ <!DOCTYPE order PUBLIC "~//DRDSL//orders//EN//1" "order.dtd"> <!DOCTYPE html PUBLIC "~//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3c.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <!DOCTYPE HTML PUBLIC "~//W3C/DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
DTD declarations
Top BottomDTD declarations consist of three parts:
- Element declarations
- Attribute declarations
- Entity declarations
Element Declarations
Top BottomThe element declaration begins: <!ELEMENT followed by the element name and a definition of the content (content model). The content model may contain element children, text, a combination of both or it may be empty. For element content, the content can be specified as a sequence or option group or both:
<!ELEMENT order (elem1, elem2, elem3)> <!ELEMENT order (elem1 | elem2 | elem3)> <!ELEMENT order (elem1 | (elem2, elem3))> <!ELEMENT order (elem1, (elem2 | elem3))>
Text content is specified as #PCDATA, and mixed content is specified by combining #PCDATA and element content as choices in the content model. The #PCDATA content must be listed first:
<!ELEMENT order (#PCDATA)> <!ELEMENT order (#PCDATA | elem1| elem2 | elem3)*>
With mixed content you must also include a 'cardinality indicator' of 'zero or more' (*) after the closing bracket of the content model. Other cardinality indicators are 'zero or once' (?), 'once or more (+). If no cardinality indicator is given, this indicates 'once and only once'
The content model can also be specified as 'empty' (no content) or 'any' (any content from the DTD can appear in any order):
<!ELEMENT br EMPTY> <!ELEMENT comment ANY>
Attribute Declarations
Top BottomAttribute declarations allow you to declare a list of allowed attributes for an element and begin <!ATTLIST followed by the element name and the attribute definition. The attribute definition begins with the name of the attribute, followed by the type of the attribute and an attribute value declaration. Attribute types are listed below:
| Type | Description |
|---|---|
| CDATA | unparsed character data |
| ID | unique identifier for an element. Must obey rules for xml names. |
| IDREF | reference by id to another pre-declared element. |
| IDREFS | list of IDREF values |
| ENTITY | reference to an external unparsed entity (file) which has been previously declared in the DTD. |
| ENTITIES | list of ENTITY values |
| NMTOKEN | name token |
| NMTOKENS | list of NMTOKEN values |
| Enumerated List | array of possible values, similar to an ENUM data field. |
Attribute value declarations appear at the end of the attribute declaration, and can be declared as:
- a default value: simply quote the value
- a fixed value: #FIXED "value"
- a required value: #REQUIRED
- an optional value: #IMPLIED
Entity Declarations
Top BottomEntities are mechanisms to refer to replacement text, XML markup or external files. There are four primary types:
| Entity Type | Description |
|---|---|
| Built-in | There are five built-in entities in XML: &, <, >, ', ". To use these entities, begin with an & character, followed by the entity name (amp, lt, gt, apos, quot), and end with a semi-colon |
| Character | To use unicode characters, begin with an ampersand, followed by a hash, the unicode value for the character, and finish with a semicolon. Or you can use the hexidecimal Unicode value by beginning with an ampersand, followed by a hash, an 'x', the hexadecimal value and finish with a semicolon. The unicode charts can be found at Unicode Charts |
| General | Must be declared previously in the DTD
before use. Generally used to specify reusable strings and
can be declared using internal declarations, external system
declarations or external public declarations:
|
| Parameter | Unlike General entities, Parameter entities
can be made from DTD declarations, and thus can be used to refer
to other DTDs, allowing modular development of DTDs: <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "xhtml-lat1.ent"> To refer to a Parameter entity precede the name with a percent sign |
