| Introduction to DTD |
|
| Introduction to XML DTD |
|
| Document Type Definition is a language that describes the contents of an SGMLdocument. The DTD is also used with XML, and the DTD definitions may be embedded within an XML document or in a separate file. |
|
| DTDs cannot be used to define XML namespaces, and DTDs are written using their own syntax, not XML syntax. As a result, XML schemas such as W3C XML Schema andRELAX NG are used to define the content of an XML document. |
|
| Document Type Definition (DTD), defined slightly differently within the XML andSGML (the language XML was derived from) specifications, is one of several SGML andXML schema languages, and is also the term used to describe a document or portion thereof that is authored in the DTD language. |
|
| A DTD is primarily used for the expression of a schema via a set of declarations that conform to a particular markup syntax and that describe a class, or type, of SGML orXML documents, in terms of constraints on the structure of those documents. A DTDmay also declare constructs that are not always required to establish document structure, but that may affect the interpretation of some documents. |
|
| DTD is native to the SGML and XML specifications, and since its introduction other specification languages such as XML Schema and RELAX NG have been released with additional functionality. |
|
| As an expression of a schema, a DTD specifies, in effect, the syntax of an "application" of SGML or XML, such as the derivative language HTML or XHTML. This syntax is usually a less general form of the syntax of SGML or XML. |
|
| In a DTD, the structure of a class of documents is described via element and attribute-list declarations. Element declarations name the allowable set of elements within the document, and specify whether and how declared elements and runs of character data may be contained within each element. |
|
| Attribute-list declarations name the allowable set of attributes for each declared element, including the type of each attribute value, if not an explicit set of valid value(s). |
|
|
|
|
|
| DTD Building Blocks |
|
| Elements , Attributes & Entities |
|
| All the XML documents (including HTML documents) are made up by the following building blocks: |
|
• Elements
• Attributes • Entities
• PCDATA
• CDATA |
|
| Elements |
|
| The main building blocks of both XML and HTML documents are Elements. |
|
Examples of HTML elements are "body" and "table".
Examples of XML elements could be "note" and "message".
Elements can contain other elements(child elements), text, or be empty.
Examples of empty HTML elements are "img", "br" and "hr". |
|
| An element is made up of a start and end tag with data in between. The tags describe the data. The data is called the value of the element. |
|
| Attributes |
|
Attributes provide extra information about elements.
Always, attributes are placed inside the starting tag of an element. It always come in name/value pairs. The following "img" element has additional information about a source file: |
|
| <img src="flower.jpg" /> |
|
| The name of the element is "img". The name of the attribute is "src". The value of the attribute is "flower.jpg". Since the element itself is empty it is closed by a " /". |
|
| Entities |
|
| Entities are characters which have a special meaning. |
|
| The following entities are predefined in XML: |
|
| Entity References | Character |
| < | < |
| > | > |
| & | & |
| " | " |
| ' | ' |
|
|
| PCDATA |
|
PCDATA means parsed character data.
Elements have values. If a value has tags representing child elements, these tags need to be expanded or parsed and handled as separate elements. |
|
| CDATA |
|
CDATA means character data.
The value of an element is treated as a single item and is not expanded. This is text that will not be parsed further by whatever application is processing your marked up document. If this text contains any entities they will not be replaced with their value, and any markup will not be treated as markup. |
|
|
|
|
|
| DTD Building Blocks |
|
| DTD Elements |
|
| Declaring an Element |
|
| In the DTD, XML elements are declared with an element declaration. An element declaration has the following syntax: |
|
| <!ELEMENT element-name (element-content)> |
|
| Empty elements |
|
| Empty elements are declared with the keyword EMPTY inside the parentheses: |
|
<!ELEMENT element-name (EMPTY)>
example:
<!ELEMENT img (EMPTY)> |
|
| Elements with data |
|
| Elements with data are declared with the data type inside parentheses: |
|
<!ELEMENT element-name (#CDATA)>
or
<!ELEMENT element-name (#PCDATA)>
or
<!ELEMENT element-name (ANY)>
example:
<!ELEMENT note (#PCDATA)> |
|
#CDATA means the element contains character data that is not supposed to be parsed by a parser.
#PCDATA means that the element contains data that IS going to be parsed by a parser. The keyword ANY declares an element with any content.
If a #PCDATA section contains elements, these elements must also be declared. |
|
| Elements with children (sequences) |
|
| Elements with one or more children are defined with the name of the children elements inside the parentheses: |
|
<!ELEMENT element-name (child-element-name)>
or
<!ELEMENT element-name (child-element-name,child-element-name,.....)>
example:
<!ELEMENT note (to,from,heading,body)> |
|
| When children are declared in a sequence separated by commas, the children must appear in the same sequence in the document. In a full declaration, the children must also be declared, and the children can also have children. The full declaration of the note |
|
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#CDATA)>
<!ELEMENT from (#CDATA)>
<!ELEMENT heading (#CDATA)>
<!ELEMENT body (#CDATA)> |
|
| Wrapping |
|
| If the DTD is to be included in your XML source file, it should be wrapped in a DOCTYPEdefinition with the following syntax: |
|
<!DOCTYPE root-element [element-declarations]>
example:
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#CDATA)>
<!ELEMENT from (#CDATA)>
<!ELEMENT heading (#CDATA)>
<!ELEMENT body (#CDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>
|
|
| Declaring only one occurrence of the same element |
|
<!ELEMENT element-name (child-name)>
example
<!ELEMENT note (message)> |
|
| The example declaration above declares that the child element message can only occur one time inside the note element. |
|
| Declaring minimum one occurrence of the same element |
|
<!ELEMENT element-name (child-name+)>
example
<!ELEMENT note (message+)> |
|
| The + sign in the example above declares that the child element message must occur one or more times inside the note element. |
|
| Declaring zero or more occurrences of the same element |
|
<!ELEMENT element-name (child-name*)>
example
<!ELEMENT note (message*)> |
|
| The * sign in the example above declares that the child element message can occur zero or more times inside the note element. |
|
| Declaring zero or one occurrences of the same element |
|
<!ELEMENT element-name (child-name?)>
example
<!ELEMENT note (message?)> |
|
| The ? sign in the example above declares that the child element message can occur zero or one times inside the note element. |
|
| Declaring mixed content |
|
example
<!ELEMENT note (to+,from,header,message*,#PCDATA)> |
|
|
|
|
|
| DTD Building Blocks |
|
| DTD Attributes |
|
| In the DTD, XML element attributes are declared with an ATTLIST declaration. |
|
| An attribute declaration has the following syntax: |
|
| <!ATTLIST element-name attribute-name attribute-type default-value> |
|
| As you can see from the syntax above, the ATTLIST declaration defines the element which can have the attribute, attribute-name is the name of the attribute, attribute-type is the type of the attribute, and the default attribute value. |
|
| The attribute-type can have the following values: |
|
| Value | Explanation |
| xml: | The value is predefined |
| ID | The value is an unique id |
| IDREF | The value is the id of another element |
| IDREFS | The value is a list of other ids |
| CDATA | The value is character data |
| NMTOKEN | The value is a valid XML name |
| NMTOKENS | The value is a list of valid XML names |
| ENTITY | The value is an entity |
| ENTITIES | The value is a list of entities |
| NOTATION | The value is a name of a notation |
| (eval|eval|..) | The value must be an enumerated value |
|
|
| The default-value can have the following values: |
|
| Value | Explanation |
| #DEFAULT value | The attribute has a default value |
| #REQUIRED | The attribute value must be included in the element |
| #IMPLIED | The attribute does not have to be included |
| #FIXED value | The attribute value is fixed |
|
|
| Attribute declaration example: |
|
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE email [
<!ELEMENT email (sender,recipient,subject,message)>
<!ATTLIST email priority CDATA "normal">
]>
<email priority='high'>
</email>
|
|
| In the above example the element email is defined to be an empty element with the attributes proprity of type CDATA. The priority attribute has a default value of “Normal”. |
|
| Click here to view the example |
|
| Enumerated attribute values |
|
Syntax:
<!ATTLIST element-name
attribute-name (eval|eval|..) default-value>
DTD example:
<!ATTLIST payment mode (check|cash) "cash">
XML example:
<payment mode="check">
or
<payment mode="cash">
|
|
| Use enumerated attribute values when you want the attribute values to be one of a fixed set of legal values. |
|
| Fixed attribute value |
|
<?xml version="1.0" encoding="UTF-7"?>
<!DOCTYPE ebiz [
<!ELEMENT ebiz (associate_details,
emp_details,sponser_details)>
<!ATTLIST ebiz companysite
CDATA #FIXED "http://www.ebizel.com">
]>
<ebiz>
</ebiz>
|
|
| Use a fixed attribute value when you want an attribute to have a fixed value without allowing the author to change it. If an author includes another value, the XML parser will return an error. |
|
| Required attribute |
|
<!ELEMENT payment (amount+,duration+)>
<!ATTLIST payment mode CDATA #REQUIRED>
XML example:
<payment mode="cash"></payment> |
|
| Use a required attribute if you don't have an option for a default value, but still want to force the attribute to be present. |
|
| Implied attribute |
|
<!ELEMENT addressdetails (address+,city?,state,pin)>
<!ATTLIST addressdetail
s communication_mode CDATA #IMPLIED>
<addressdetails communication_mode="post">
</addressdetails>
|
|
| Use an implied attribute if you don't want to force the author to include an attribute and you don't have an option for a default value either. |
|
| Click here to view the example |
|
|
|
|
|
| DTD Building Blocks |
|
| DTD Entities |
|
| Entities |
|
• Entities as variables used to define shortcuts to common text.
• Entity references are references to entities.
• Entities can be declared internal.
• Entities can be declared external |
|
| Declaring an Internal Entity |
|
Syntax:
<!ENTITY entity-name "entity-value">
DTD Example:
<!ENTITY developer "Ravish@ebizel.com">
<!ENTITY copyright " Copyright 2007
eBIZ.com Pvt. Ltd. All Rights Reserved ">
XML example:
<author>&developer;©right;</author>
|
|
| Declaring an External Entity |
|
<!ENTITY entity-name SYSTEM "URI/URL">
DTD Example: <!ENTITY developer SYSTEM
"http://education.ebizel.com/html/xml/entity.xml">
<!ENTITY copyright SYSTEM " http://education.ebizel.com/html/xml/entity.xml ">
XML example: <author>&developer;©right;</author> |
|
| Complete example |
|
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE tutorial [
<!ELEMENT ebiz (tutorial+)>
<!ELEMENT tutorial (category,title,author,url,launch_date,info)>
<!ATTLIST tutorial category (cc|ll|ul) #REQUIRED>
<!ELEMENT launch_date (date,month,year)>
<!ENTITY developer "info@ebizel.com">
<!ENTITY copyright "eBIZ.com, All right Reserved, 2007">
]>
<ebiz>
<tutorial category="cc">
<category>Scripting Language</category>
<title>XML</title>
<author></author>
<url>http://education.ebizel.com/html/xml/</url>
<launch_date>
<date>25</date>
<month>April</month>
<year>2007</year>
</launch_date>
<info>Author: &developer; </info>
<info>©right; </info>
</tutorial>
</ebiz>
|
|
|
|
|
|
| DTD Building Blocks |
|
| DTD ATTLIST Declaration |
|
| This defines an attribute list for an element. It defines the available attributes the element can have. ATTLIST syntax: |
|
| <!ATTLIST ElementName AttName AttType DefaultValue> |
|
| The ElementName is the name of the element for which the list of attributes are being defined. The parts of the attribute declaration include: |
|
• AttName - The the name of the attribute being defined.
• AttType - Defines the type of data that may be used for the value such as CDATA(character data). The attribute type values may include: |
|
o CDATA - Used to specify a string type. The HTML 4 transitional DTD uses the CDATAword as follows:
o <!ATTLIST FONT o %coreattrs; -- id, class, style, title --
o %i18n; -- lang, dir -- o size CDATA #IMPLIED -- [+|-]nn e.g. size="+1", size="4" --
o color %Color; #IMPLIED -- text color --
o face CDATA #IMPLIED -- comma-separated list of font names --
o >
o ENTITY - Reference to an external file such as a graphic file for importing an image. Used for a tokenized type.
o ENTITIES - Used to include multiple entities. Used for a tokenized type.
o ID - There can be only one unique value used in a tokenized type.
o IDREF - Used as a reference to another element with an attribute value with the ID value set. Used for a tokenized type.
o IDREFS - Used as a reference to more than one element with an attribute value with the ID value set. Used for a tokenized type. The attribute values referenced may be of the form (in the DTD): |
|
| <!ATTLIST Object Unique ID #REQUIRED Reference IDREFS #IMPLIED |
|
| The XML element declaration: |
|
| <Object Unique'"N03" Reference="N01 N02"> |
|
o NMTOKEN - A name token similar to an ID token. Used for a tokenized type.
o NMTOKENS - A list of multiple name tokens. Used for a tokenized type.
o NOTATION - Used to identify a program to process a type of data format or identify the type of data format. This must match a notation declaration as declared by a NOTATION keyword in the DTD (See the NOTATION section). An example:
o <!ATTLIST MYDOC texttype NOTATION ( tex | pl ) #REQUIRED>
o (name1 | name2) - Used for an enumerated type, this is a list with name tokens as shown in the HTML DTD for the HR element below: |
|
| align (left|center|right) #IMPLIED |
|
| • Default Value - Specifies the value of the attribute name if its value is not otherwise defined. The possible values are: |
|
o The character data (CDATA) default value of the attribute in a quoted string form.
o #FIXED - Used to fix a default value of the attribute.
o #IMPLIED - Optional
o #REQUIRED - One value is required. |
|
| The AttName, AttType, and DefaultValue may be repeated in the attribute liste in order to define multiple attributes for the element. Here's an example of an attribute list for the HR element from the HTML 4 transitional DTD: |
|
<!ATTLIST HR
%attrs; -- %coreattrs, %i18n, %events --
align (left|center|right) #IMPLIED
noshade (noshade) #IMPLIED
size %Pixels; #IMPLIED
width %Length; #IMPLIED
>
|
|
| This example uses several ENTITIES, which are name definitions. The comment: |
|
| -- %coreattrs, %i18n, %events -- |
|
| is indicating that the entities %coreattrs, %i18n, %events are contained within the %attrs entity. This is defined earlier by the line: |
|
| <!ENTITY % attrs "%coreattrs; %i18n; %events;"> |
|
| As you may notice there are special words such as implied in the ATTLIST declarations. Their meanings are: |
|
| • #CURRENT - The first time an element appears, its type must be specified. |
|
| Other Keywords |
|
• NAME - "string of 1-8 characters (8 is a "default" limit); starting with a-z or A-Z, followed by a-z or A-Z, hyphen, period"
• NAMES - "list of NAME values; each string separated by one or more spaces, tabs or returns ("separators")"
• NUMBER - "a string of 1-8 characters consisting of the digits 0-9"
• NUMBERS - "list of NUMBER values; each separated by a separator+."
• NUTOKEN - "string of 1-8 characters beginning with 0-9 followed by a-z, A-Z, 0-9, hyphen, period."
• NUTOKENS - "list of NUTOKEN values; each separated by a separator+"
• "SYSTEM associates instructions or meaning with the new notation"
|
|
|
|
|
|
| DTD Building Blocks |
|
| DTD NOTATION , SHORTREF |
|
| The keywords in this section are not as commonly used and are not as well documented. None of these keywords are used in the HTML 4.0 DTD. These keywords are not well documented and you should look for other sources if you want detailed information. |
|
| NOTATION |
|
| The NOTATION declaration creates a parameter that the system will recognize as a content notation. By itself it does nothing more than that and is may be used with other keywords such as ATTLIST or ENTITY. If used as in the example below, theNOTATION may be referenced when specifying the ATTLIST for an element (See theATTLIST section). The NOTATION syntax is: |
|
| <!NOTATION Name SYSTEM Location> |
|
| The Loaction is a universal resource indicator (URI) for a file name which may specify a local path or a complete path over the internet such as: |
|
| "http://ctdp.tripod.com/independent/publishing/guide/file.xml". |
|
| The keyword SYSTEM in the example below associates a set of instructions or program name with the notation (pl). |
|
| <!NOTATION pl SYSTEM "/usr/bin/perl" > |
|
| SHORTREF |
|
| It is used to set up an association between markup and short strings. |
|
| USEMAP |
|
| It is used to activate SHORTREFS. This keyword is not commonly used in SGML |
|
|
|
|
|
|
|
|
|
|
|
|
|
0 comments: