Images

XML - PART I


XML
The eXtensible Markup Language (XML) is a general-purpose markup language. Its primary purpose is to facilitate the sharing of data across different information systems, particularly via the Internet. It is a simplified subset of the Standard Generalized Markup Language (SGML), and is designed to be relatively human-legible.Tally 8.1 logo
By adding semantic constraints, application languages can be implemented in XML. These include XHTMLRSSMathML,GraphMLScalable Vector Graphics(SVG), MusicXML, and thousands of others.
Moreover, XML is sometimes used as the specification language for such application languages.
What makes XML truly powerful is the acceptance and hard work done by all those who work with databases, programming, office application, etc.
It is because of this hard work that the tools exist to do these conversions from whatever platform into standardized XML data or convert XML into a format used by that platform.


Markup Languages
History Markup Languages
What is a Markup language?
A set of labels that are embedded within text to distinguish individual elements or groups of elements for display or identification purposes. The labels are typically known as "tags".
A markup language combines text and extra information about the text. The best-known markup language in modern use is HTML (HyperText Markup Language), one of the foundations of the World Wide Web. The extra information, for example about the text's structure or presentation, is expressed using markup, which is intermingled with the primary text. Originally markup was used in the publishing industry in the communication of printed work between authors, editors, and printers.
For content identification, markup languages turn a text document into the equivalent of a database record in which individual data elements can be located for processing. For rendering, markup languages indicate where font and other layout changes start and stop.
History of Markup languages
A markup language combines text and extra information about the text. The extra information, for example about the text's structure or presentation, is expressed using markup, which is intermingled with the primary text. The best-known markup language in modern use is HTML (HyperText Markup Language), one of the foundations of the World Wide Web. Originally markup was used in the publishing industry in the communication of printed work between authors, editors, and printers.
GenCode the 1st markup language!,
SGML is first widely used markup language of the era, but the GenCode is first markup language. Many would surprise to know that, because many people use to belive that SGML is 1st markup language.
GenCode
The idea of "markup languages" was apparently first presented in 1967, by William W. Tunnicliffe at a conference. Tunnicliffe would later lead the development of a standard called GenCode for the publishing industry. Book designer Stanley Fish also published theory along similar lines in the late 1960s. Brian Reid, in his 1980 thesis at Carnegie Mellon University, developed the theory and a working implementation of descriptive (or expressive) markup in actual use. However, IBM researcher Charles Goldfarb is more commonly seen today as the "father" of markup languages, because of his work on IBM GML, and then as chair of the International Organization for Standardization committee that developed SGML, the first widely used descriptive markup system.
Some early examples of markup languages available outside the publishing industry can be found in typesetting tools on Unix systems such as troff and nroff. In these systems, formatting commands were inserted into the document text so that typesetting software could format the text according to the editor's specifications.
TeX
Another major publishing standard is TeX, created and continuously refined by Donald Knuth in the 1970s and 80s. TeX concentrated on detailed layout of text and font descriptions in order to typeset mathematical books in professional quality. This required Knuth to spend considerable time investigating the art of typesetting. However,TeX requires considerable skill from the user, so that it is mainly used in academia, where it is a de-facto standard in many scientific disciplines. A TeX macro package known as LaTeX provides a descriptive markup system on top of TeX, and is widely used.
SGML
Standard Generalized Markup Language
The first language to make a clear and clean distinction between structure and presentation was certainly Scribe, which was developed by Brian Reid and described in his doctoral thesis in 1980. Scribe influenced the development of Generalized Markup Language (GML)(later SGML) and is a direct ancestor to HTML and LaTeX.
SGML specified a syntax for including the markup in documents, as well as one for separately describing what tags were allowed, and where (the Document Type Definition (DTD) or schema)). This allowed authors to create and use any markup they wished, selecting tags that made the most sense to them and were named in their own natural languages. Thus, SGML is properly a meta-language, and many particular markup languages are derived from it. From the late 80s on, most substantial new markup languages have been based on SGML system, including for example TEI and DocBook. SGML was promulgated as an International Standard by International Organization for StandardizationISO 8879, in 1986.
SGML found wide acceptance and use in fields with very large-scale documentation requirements. However, it was generally found to be cumbersome and difficult to learn, a side effect of attempting to do too much and be too flexible.
Types of Markup
There are mainly three types of markup language: presentational markup, procedural markup, and descriptive markup.
Presentational Markup
Presentational markup is an attempt to deduce document structure from cues in the encoding. For example, in a text file, the title of a document might be preceded by several newlines and/or spaces, thus suggesting leading spacing and centering. Word-processing and desktop publishing products sometimes attempt to infer structure from such conventions, but, as the enormous variety of Wiki plain-text conventions prove, this is, as of yet, an unresolved problem.
Procedural markup
Procedural markup has been widely used in professional publishing applications, where professional typographers can be expected to learn the languages required.Procedural markup is typically also focused on the presentation of text, but is usually visible to the user editing the text file, and is expected to be interpreted by software in the order in which it appears. To format a title, a succession of formatting directives would be inserted into the file immediately before the title's text, instructing software to switch into centered display mode, then enlarge and embolden the typeface. The title text would be followed by directives to reverse these effects; in more advanced systems macros or a stack model make this less tedious. In most cases, the procedural markup capabilities comprise a Turing-complete programming language. Examples of procedural-markup systems include nrofftroffTeX and Lout.
Descriptive markup
Descriptive or semantic markup applies labels to fragments of text without necessarily mandating any particular display or other processing semantics. For example, the Atomsyndication language provides markup to label the "updated" time-stamp, which is an assertion from the publisher as to when some item of information was last changed.

Markup Languages
Introduction to XML
The eXtensible Markup Language (XML) is a general-purpose markup language. Its primary purpose is to facilitate the sharing of data across different information systems, particularly via the Internet. So, please get your fact right about XML, XML is meant for data exchange not for data storage.
XML is a simplified subset of the Standard Generalized Markup Language (SGML), and is designed to be relatively human-legible. By adding semantic restraints, application languages can be implemented in XML. These include XHTMLRSS, MathML, GraphML, Scalable Vector Graphics(SVG), MusicXML, and thousands of others. Moreover, XML is sometimes used as the specification language for such application languages. There are thousands of languages based on XML.
XML is nothing by itself. XML is more of a "common ground" standard. The main benefit of XML is that you can take data from a program like MSSQL(Microsoft SQL) or MySQL, convert it into XML, then share that XML with a slough of other programs, platforms, etc. Each of these receiving platforms can then convert the XML into a structure the platform uses normally and presto! you have just communicated between two potentially very different platforms!
What makes XML truly powerful is the acceptance and hard work done by all those who work with databases, programming, office application, etc. It is because of this hard work that the tools exist to do these conversion from whatever platform into standardized XML data or convert XML into a format used by that platform.
In the past, attempts at creating a standardized format for data that could be interpreted by many different platforms (i.e. different applications) failed miserably where XML has largely succeeded.
What is XML?
XML stands for EXtensible Markup Language
XML is a markup language much like HTML.
XML was designed to describe data.
XML tags are not predefined in XML. You must define your own tags.
XML is self describing.
XML uses a DTD (Document Type Definition) to formally describe the data.
XML is extensible
XML allows the author to define his own tags and his own document structure. The tags used to markup HTML documents and the structure of HTML documents are predefined. The author of HTML documents can only use tags that are defined in the HTML standard.
It is important to understand that XML is not a replacement for HTML. In the future development of the Web it is most likely that XML will be used to structure and describe the Web data, while HTML will be used to format and display the same data.
XML usage
Although there are countless of applications that use XML, here are a few examples of the current platforms and applications that are making use of this technology.
Cell Phones
XML data is sent to some cell phones, which is then formatted by the specification of the cell phone software designer to display text, images and even play sounds!
File Converters
Many applications have been written to convert existing documents into the XMLstandard. An example is a PDF to XML converter.
VoiceXML
Converts XML documents into an audio format so that you can listen to an XML document.
And many more…..
"XML is the future for all data transmission and data manipulation over the Web."

What is XML?
XML and SGML
1. Differences between XML and SGML
XML allows only documents that use the SGML declaration in this note. This declares all the following SGML features as:
• DATATAG
• OMITTAG
• RANK
• LINK (SIMPLE, IMPLICIT and EXPLICIT)
• CONCUR
• SUBDOC
• FORMAL
Note that it differs from the reference concrete syntax in a number of ways:
• It also declares no short reference delimiters; it follows that SHORTREF and USEMAP declarations cannot occur in XML
• The PIC (processing instruction close) delimiter is ?>  
• Quantities and capacities are effectively unlimited  
• Names are case sensitive (NAMECASE GENERAL is NO)  
• Underscore and colon are allowed in names  
• Names can use Unicode characters and are not restricted to ASCII 
The following constructs which are permitted in SGML when SHORTTAG is YES are not allowed in XML:
• Unclosed start-tags
• Unclosed end-tags
• Empty start-tags
• Empty end-tags
• Attribute values in attribute specifications entered directly rather than as literals
• Attribute specifications that omit the attribute name
NET delimiters can be used only to close an empty element. In SGML without the Web SGML Adaptations Annex, the NET delimiter is declared as />. With this approach, XML is not allowing null end-tags and is allowing net-enabling start-tags only for elements with no end-tag. In SGML with the Web SGML Adaptations Annex, there is a separateNESTC (net-enabling start tag close) delimiter. This allows the XML syntax to be handled as a combination of a net-enabling start-tag . With this approach, XML is allowing a net-enabling start-tag only when immediately followed by a null end-tag.
XML imposes the following restrictions not in SGML:
Entity references
• General entity references in content are required to be synchronous
• External entity references in attribute values are not allowed
Character references
• Named character references are not allowed
• Numeric character references to non-SGML characters are not allowed
Entity declarations
• A #DEFAULT entity cannot be declared
• External SDATA entities are not allowed
• External CDATA entities are not allowed
• Internal SDATA entities are not allowed
• Internal CDATA entities are not allowed
• An ampersand in a parameter literal must be followed by a syntactically valid entity reference or numeric character reference
Attribute definition list declarations
• Associated element type in attribute definition list declarations cannot be a name group
• Attributes cannot be declared for a notation
• A name token group must use the or connector
• Attribute values specified as defaults in attribute definition list declarations must be literals .
Element type declarations
• Associated element type in element type declaration cannot be a name group
• In an element declaration, a generic identifier cannot be specified as a rank stem and rank suffix
• Minimization parameters in element declarations are not allowed
RCDATA declared content are not allowed
CDATA declared content are not allowed
• Content models cannot use the and connector
• Content models for mixed content have a restricted form
Comments
• A parameter separator cannot contain comments; this means that markup declarations (other than comment declarations) cannot contain comments
• Empty comment declarations ( in the reference concrete syntax) are not allowed
• A comment declaration cannot contain more than one comment
Processing instructions
• Processing instructions must start with a name (the PI target)
• A processing instruction whose PI target is xml can only occur at the beginning of a external entity and must be an XML declaration if it occurs in the document entity, and otherwise an text declaration
Marked sections
• In marked section declarations, TEMP status keyword is not allowed
• In a marked section declaration, a status keyword specification that contains no status keywords is not allowed
• In a marked section declaration, a status keyword specification cannot contain more than one status keyword
• Marked sections are not allowed in the internal subset
• Parameter separators are not allowed in status keyword specifications in the document instance; in particular, parameter entity references are not allowed 
Other
• Names beginning with [Xx][Mm][Ll] are reserved
• The SGML declaration must be implied and cannot be explicitly present in the document entity 
• When < and & occur as data, they must be entered as < and &
• A parameter separator required by the formal syntax must always be present and cannot be omitted when it is adjacent to a delimiter
XML predefines the semantics of the attributes xml:space and xml:lang. It also reserves all attribute, element type and notation names beginning with [Xx][Mm][Ll].
XML requires that an SGML parser use an entity manager that behaves as follows:
• Lines are terminated by newline (Unicode code #X000A) rather than being delimited by RS and RE as with a typical SGML entity manager
• System identifiers are treated as URLs
• The entity manager must support entities encoded in UTF-16 and UTF-8, and must be able automatically to detect which encoding an entity uses based on the presence of the byte order mark
• The entity manager should be able to recognize the encoding declaration in the XMLdeclaration and encoding PI and use it to determine the encoding of entity
XML imposes requirements on the information that a parser must make available to an application.

What is XML?
XML and HTML
XML is conceived of as a way to extend the power of online delivery of information. Although both XML and HTML are derived from SGML (Standard Generalized Markup Language), there are important differences between the two markup languages. For one, HTML provides a fixed set of tags, while XML enables users to define the tags that they need. For another, HTML primarily describes how information should be rendered; it does not lay out what the information is or how it is structured. HTML is about information display or presentation, whereas XML concentrates mainly on structure of the information.
In contrast, XML provides information about structure and syntax. (Directions for formatting are included in a separate stylesheet attached to the document.) WhereasHTML is meant only to be used for presentation of documents in Web browsersXMLhas wider applications. Not only can it be used for the web, but also as a storage format for word processors, as a data interchange format, and as a preservation format that is readable by humans users.
Let's make this more concrete by looking at an example, here a tutorial record:
HTML Version
XYZ Kumar
Anatomy of XML
XML guide for Developers
eBIZ.com Pvt. Ltd.
The HTML code looks like this:
<HTML>
<BODY>
<B>XYZ Kumar</B><BR>
<I> Anatomy of XML </I> <BR>
XML guide for Developers <BR>
<b> Company : eBIZ.com</b>
</BODY>
</HTML>
XML Version
In contrast, XML enables you to describe the specific components of a document or database
<?xml version="1.0" encoding=”ISO-8859-1”?>
<tutorial>
<author>
XYZ Kumar</author>
<title>Anatomy of XML</title>
<subtitle>XML guide for Developers </subtitle>
<company>eBIZ.com Pvt. Ltd.</company>
</tutorial>
To achieve the same online presentation, the XML document would be associated with an XSL stylesheet like the following:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="tutorial">
<html><body>
<xsl:apply-templates/>
</body></html>
</xsl:template>
<xsl:template match="author">
<p><b><xsl:apply-templates/></b></p>
</xsl:template>
<xsl:template match="title">
<p><i><xsl:apply-templates/></i></p>
</xsl:template>
<xsl:template match="subtitle">
<p><i><xsl:apply-templates/></i></p>
</xsl:template>
<xsl:template match="company">
<p><xsl:apply-templates/></p>
</xsl:template>
</xsl:stylesheet>
The Main Difference Between XML and HTML
XML is not a replacement for HTML, it is a complement to XML.
Both, XML and HTML were designed with different goals:
XML was designed to describe data and to focus on what data is. That is, structure of data.
HTML was designed to display data and to focus on how data looks. That is, presentation of data.
HTML is about presenting information, while XML is about describing information.
XML was designed to carry data and HTML was designed to display the on user’s browser window. HTML does not know anything about data, and its structure and XML does not know anything about how to present the data.

What is XML?
Why XML?
In order to appreciate XML, it is important to understand why it was created. XML was created so that richly structured documents could be used over the web. The only viable alternatives, HTML and SGML, are not practical for this purpose. HTML has been developed for something else and is not suited for data exchange and storage of structured data.
HTML, as we've already discussed, comes bound with a set of semantics and does not provide arbitrary structure. To do something with HTML you have to use fixed set of tags provided by the language.
SGML provides arbitrary structure, but is too difficult to implement just for a web browser. Full SGML systems solve large, complex problems that justify their expense. Viewing structured documents sent over the web rarely carries such justification.
This is not to say that XML can be expected to completely replace SGML. While XML is being designed to deliver structured content over the web, some of the very features it lacks to make this practical, make SGML a more satisfactory solution for the creation and long-time storage of complex documents. In many organizations, filtering SGML toXML will be the standard procedure for web delivery.
XML, or eXtensible Markup Language, is a meta-language, a language used to define and create other languages. XML provides rules for describing the structure of a document, thus facilitating the exchange and publication of information.
XML has the following features:
• Extensibility
The author may define tags and how they are to be processed. It is extensible because author can extend its functionality by adding new tags.
• Document structuring:
Unlike HTMLXML describes the structure of the document.
• Validation:
The document can be checked to ensure that it conforms to the required syntax and that it contains all of the required parts.
• Convertibility:
XML provides a basic data format that can be output to different devices, attached to different stylesheets to enable a range of presentation styles, and delivered in whole or in pieces.
• Flexibility
XML is a very flexible language and yet easy to learn and use, unlike SGML, its originator which was not so flexible. What one, need is just remember the basic XMLguidelines and don’t forget that there is on other rule in XML.
Advantages of XML
• Separates structure from presentation
As a result, it is much easier to manipulate XML and put it to various uses.
• Flexibility and extensibility
XML enables users to create their domain-specific markup languages. XML can accomodate a wide range of communities, from musicians to chemists to students.
• Sustainability
XML is self-describing; humans can, in general, discern the meaning of XML tags, and there's a lot of clear XML documentation to help this process. Whereas proprietary data formats are difficult to preserve, XML can be created in ASCII, which is stable and likely to be readable for a long time.
• Exchange of data
XML can easily be transformed according to the user's needs. For instance, businesses can receive data from another company's system and translate it for their own. XML is non-proprietary, hardware- and software-independent, and fairly simple to author, so it is a natural choice for the exchange of information among different applications. Moreover,
• Enables more powerfultargeted searches, allowing you to search within different categories of data, or fields. With the HTML document, search software wouldn't be able to distinguish an author's name from anything else, but XML defines specific categories and structures. For instance, if you are looking for "Rice" the street name rather than the grain or the university, you might search within a specific address field.
• Enables reuse and provides different views of the data to different users
Because XML defines the structure of the document and the relationship among parts, it can be used for many different purposes. "A single data source can be used in different ways for different media.  If you need pull out and present a subset of information from an XML file (say, for instance, a list of classes offered at a particular time), XML enables that selective presentation. Since the source has a clear structure, the conversion can be automated. Links, for example, can be emphasized in the print only in the format (e.g. italic) whereas they could be hyperlinks on a CD-ROM. Thus, independence from hard- and software is ensured: if, for example, the software used for printing changes, the conversion of this output format is adapted and the data source remains unchanged
XML in Future Web Development
XML is going to be everywhere
We have been participating in XML development since its creation. It has been amazing to see how quickly the XML standard has been developed and how quickly a large number of software vendors have adopted the standard.
We strongly believe that XML will be as important to the future of the Web as HTML has been to the foundation of the Web and that XML will be the most common tool for all data manipulation and data transmission. 

XML Basic
XML how to
How to create an XML document?
In any markup language, the first element to appear is called the "root element", which defines what kind of document that file will be. In an HTML file the tag is the root element. An HTML file will always have the HTML element as the root element, while in an XML file it can be anything.
In an XML file there can only be one root element. The root element must encapsulate all other elements, meaning these other elements must show up after the opening root tag and before the closeing root tag. Given below is an example of an XML document with the root element "email".
With XML we can define the structure of our document.
For example, we want to create an email, then the structure of the same will as follows:
<?xml version="1.0" encoding="ISO-8859-1"?>
<email>
<sender>
       <sender_name>XYZ Kumar</sender_name>
       <sender_email_id>xyz.kumar@gmail.com.com</sender_email_id>
</sender>
<recipient>
       <recipient_name>ABC</recipient_name>
      <recipient_email_id>visu.k@gmail.com.com</recipient_email_id>
</recipient>
<subject>Hello! how r U</subject>
<message>
      <message_header>Hello dear</message_header>
      <message_body_text>Hello this is Ravish, how are you dear</message_body_text>
       <message_footer>bye!, take care</message_footer>
</message>
</email>
The XML document above, define the structure of an email document.
Another rule for XML files is only one root element per file is allowed, this means no XML file can have two "root" elements. Our previous example followed this rule, but our example below does not because it has two root elements. What are the two root elements?
<?xml version="1.0" encoding="ISO-8859-1"?>
<email>
<sender>
       <sender_name> </sender_name>
      <sender_email_id> </sender_email_id>
</sender>
<recipient>
      <recipient_name> </recipient_name>
      <recipient_email_id> </recipient_email_id>
</recipient>
<subject> </subject>
<message>
      <message_header> </message_header>
<message_body_text> </message_body_text>
<message_footer> </message_footer>
</message>
</email>
<mail>
<text>
</text>
</mail>
If you said "email" and "mail" were the two root elements then you got it right! Phone book is the first element to appear in this file, so it is automatically a root element. After the phonebook element is closed, no other elements should follow because then there would be another root element. "mail" is the element that did not follow the one root element rule and transformed this XML file into a lawless, rule-breaking file! and displays error while viewing. Always remember, no XML file can have more than one "root" element.
What we to getting started?
To create an XML document all we need is a plain text editor like Notepad etc. there are some expensive XML editing tools available but as long as we are not using it for complex development where rapid development is needed, we don’t need them. Notepad is a power full editor for XML editing. You can also try Notepad++, which is an extremly powerful plain text editor and supports more than 20 languages.
Open the Notepad and type the following code and save it with “.xml” extension.
Notepad
How to view XML document?
To view the created XML file open the xml file in your browser i.e. Internet Explorer
Internet Explorer
With Internet Explore
Mozilla FireFox
With Mozilla Firefox

XML Basic
XML Syntax and Elements
XML syntax
XML is used for structured, descriptive markup. Each XML element can contain text, other XML elements that are known as its children, or nothing. In designing an XMLdocument, we should think about what will contain what. You can think of the structure as a family tree with many branches, or as containers within containers. The root element is the top-level element that is the parent of all of the other elements--that is, it contains everything else that appears in the document.
Basic XML Rules
• Elements that contain data must have start tags and ending tags
<email> [...] </email>.
• Empty tags must be closed
If a tag contains no data and therefore takes no closing tag (e.g., for a page or line break or an image), then embed the closing within the tag itself: <br />
or provide a closing tag: <br> </br>
• Nest tags properly
Elements should not overlap.
Bad Nesting
<sender> <sender-email-id> </sender> </sender-email-id>
Good Nesting
<sender> <sender-email-id> </sender-email-id> </sender>
• All attribute values must be wrapped in quotation marks
For instance, you should use:
<a href="products.php">
rather than
<a href=products.php>
• A declaration must appear at the top of an XML document to signify what it is:
XML Declaration, e.g.
<?xml version="1.0"?>
• Use a consistent case
Whereas in HTML you can use upper and lower case with abandon, it is good form to use a consistent case in XML—generally lower case. XML is case sensitive, so <TAG>and <Tag> will be treated differently. It is recommended that you use lowercase as tag names.
• One root element per file
Another rule for XML files is only one root element per file is allowed.
XML Elements
Elements serve as the building blocks of XML--as the basic units of description.
An element is comprised of an opening and closing tag as well as the content within:
<associate>Dharmendra Das </associate>
XML tags typically come in pairs: opening tags and closing tags. Each tag is wrapped in angle brackets; end tags have a backward slash before the element name.
XML permits users to invent element names, so long as the names begin with a letter or an underscore (_) and do not include white spaces.
Some elements are empty--that is, they do not wrap around any content and do not close. For instance, a page or a link break would be an empty element, as would be the insertion of an image.
XML Declaration
XML documents must begin with an XML declaration, which provides processing instructions and identifies the version of XML in use.
At present, XML 1.0 is the only version, so the declaration would be:
<?xml version="1.0"?>
XML Root Element
Immediately following the XML declaration is the root element, which will close at the very end of the document.
For instance, if we were writing an XML document whose root element is "book", it would look like:
<?xml version="1.0"?>
<associate_details>
[everything else]
</associate_details>
Always remember, for XML files is only one root element per file is allowed.

XML Basic
XML Attributes
Unlike in HTMLXML requires that all XML attributes have a value. This means all attributes have to be equal to something! Below is the correct way to create an attribute in XML.
If elements are nouns defining what something is, attributes are adjectives modifying the nouns. In general, data itself should be stored in elements, while information about the elements (or data) should be included in the attributes, as in:
<associate type="silver">Dharmendra Das</associate>
Type would be the attribute, "silver" the value. In the attribute/value pair, the value should always be enclosed in quotation marks. The attribute provides additional information about the element--in this example, enabling you to distinguish between silver associate, gold associates and diamond associate.
Remember, all attribute values must be wrapped in quotation marks.
For instance, you should use:
<associate type="silver">
rather than
<associate type=silver>
The type of quotes that you use around your attribute values is entirely up to you. Both the single quote ' and the double quote " are perfectly fine to use in your XMLdocuments.
The only instance when you might want to use one quotation over the other is if your attribute value contains a quotation or apostrophe of its own. Below are a couple of real world examples.
<jre version=’jre1.5."Update 11"’>
<vendor>SUN</vendor>
</jre>

XML Basic
XML Entity References
An entity is a symbolic representation of information.
What does that mean?

Well, let's imagine for a moment that we wanted to create an introduction that is included in every single letter that we write. It would be monotonous to have to type out a three sentence introduction for every letter, but not to worry, XML entities can help us out.
With symbolic representation of information, a lot of text, such as, "Best view resolution 800 x 600" can be represented by an entity symbol such as &view;
The format of an entity in XML is an ampersand(&), followed by the name of the symbol, and concluded with a semicolon.
• Generic Entity - &name;
HTML is another markup language that supports entities. Below are some example entities and the information they represent.
•&copy; = ©
• &lt; = <
• gt; = >
• &reg; =®
Creating an XML Entity
An entity must be created in the Document Type Definition (DTD). When you know where to place the entity, the rest is easy. Here is the syntax for creating your own XMLentities.
<!ENTITY entityName "The text to appear when the entity is used">
Below we have created an entity for the default introduction we want to include on all of our documents.
XML Code:
<!ENTITY copy " Copyright 2007 eBIZ.com Pvt. Ltd. All Rights Reserved">
Using Entity
After the entity has been created in the DTD it can then be referenced. An example email XML document that uses such an entity would look like:
<!ENTITY copy " Copyright 2007 eBIZ.com Pvt. Ltd. All Rights Reserved"> <associate_details>
<associate type=”silver”>Dharmendra Das </associate>
<info>© </info>
</associate_details>
Entities are great for many situations. Especially if we... use something a lot. If we have a default introduction, signature, or something else that is commonly used, you should use an entity. change something often. If we have a relatively static document that has one or two pieces of information, that are used throughout the document, that change frequently replace them with entities. You only need to change the value of the entity to change hundreds or maybe even thousands of references that are in your XMLdocument. are using complex ascii characters that don't occur on your keyboard: © and® are easy when you use entities.

XML Basic
Comments
XML comments have the exact same syntax as HTML comments. A comment is used to temporarily edit out a portion of XML code or to leave a note . Although XML is a supposed to be self-describing data, you may still come across some instances where an XML comment might be necessary. Below is an example of a notation comment that should be used when you need to leave a note to yourself or to someone who may be viewing your XML.
Example:
<?xml version="1.0" encoding="ISO-8859-15"?>
<!--Associate commission details are updated bi-weekly -->
<associate_details>
  <associate>
   <name>XYZ Kumar</name>
   <commission>50K</commision>
</ associate >
<associate>
     <name>Dharmendra Das</name>
     <commission>52K</commision>
</ associate >
<associate>
     <name>ABC </name>
     <commission>92K</commision>
</ associate >
</associate_details>
<!-- End of associate commission details-->
For many different reasons, sometimes you might want to temporarily remove someXML code from your XML document. XML Comments let you do this easily, as this example below shows.
Example:
<?xml version="1.0" encoding="ISO-8859-15"?>
<!--Associate commission details are updated bi-weekly -->
<associate_details>
<--
<associate>
    <name>XYZ Kumar</name>
    <commission>50K</commision>
</ associate >
-->
<associate>
    <name>Dharmendra Das</name>
    <commission>52K</commision>
</ associate >
<associate>
    <name>ABC </name>
    <commission>92K</commision>
</ associate >
</associate_details>
<!-- End of associate commission details-->

Valid XML
What is a Document?
XML Document
Remember, unlike HTMLXML does not have any predefined tags (elements) to use! Yes that’s true there is no predefined tags in XML to use, instead XML authors can “invent” their own tag to create XML document. That is the liberty of XML, it gives us the power & flexibility to create our own tags to define and describe the structure of our document. With HTML we have no choice, other then to use the already defined tags. In this scenario we don’t have that kind of freedom, because meaning of tags is already defined. One more thing HTML does not know anything about data; presentation of data performed through HTML.
With XML we can define the structure of our document.
For example, we want to create an email, and then the structure of the same will as follows:
<?xml version="1.0" encoding="ISO-8859-1"?>
<email>
<sender>
       <sender_name>XYZ Kumar</sender_name>
      <sender_email_id>xyz.kumar@gmail.com</sender_email_id>
</sender>
<recipient>
      <recipient_name>ABC</recipient_name>
      <recipient_email_id>visu.k@gmail.com</recipient_email_id>
</recipient>
<subject>Hello! how r U</subject>
<message>
      <message_header>Hello dear</message_header>
      <message_body_text>Hello this is Ravish, how are you dear</message_body_text>
      <message_footer>bye!, take care</message_footer>
</message>
</email>
Click here to view the XML file
the XML document above, define the structure of an email document.
An XML document is a set of user defined elements(tags), that are used to define and describe structure of a document. Using these elements we can define “who contain what”.
 
Valid XML
Well- Formed XML Document
A "Well Formed" XML document has correct XML syntax and follows XML rules for naming elements, attributes etc.
A "Well Formed" XML document is a document that conforms to the XML syntax rules, i.e.:
XML tags are case sensitive
XML documents must have a root element, only one root element is allowed per file.  
XML elements must be closed, means they must have a closing tag.
XML elements must be properly nested  
XML attribute values must always be quoted 
<?xml version="1.0" encoding="ISO-8859-1"?>
<email>
<sender>
      <sender_name>XYZ Kumar</sender_name>
      <sender_email_id>xyz.kumar@gmail.com</sender_email_id>
</sender>
<recipient>
      <recipient_name>ABC</recipient_name>
      <recipient_email_id>visu.k@gmail.com</recipient_email_id>
</recipient>
      <subject>Hello! how r U</subject>
<message>
      <message_header>Hello dear</message_header>
      <message_body_text>Hello this is Ravish, how are you dear</message_body_text>
      <message_footer>bye!, take care</message_footer>
</message>
</email>

Valid XML
Valid XML Document
An XML document that complies with a particular schema, in addition to being well-formed, is said to be valid. A "Valid" XML document also conforms to a DTD.
A "Valid" XML document is a "Well Formed" XML document, in addition it also conforms to the rules of a Document Type Definition (DTD):
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE email SYSTEM "emailStructure.dtd">
<email>
<sender>
      <sender_name>XYZ Kumar</sender_name>
      <sender_email_id>xyz.kumar@gmail.com</sender_email_id>
</sender>
<recipient>
      <recipient_name>ABC</recipient_name>
      <recipient_email_id>visu.k@gmail.com</recipient_email_id>
</recipient>
<subject>Hello! how r U</subject>
<message>
      <message_header>Hello dear</message_header>
      <message_body_text>Hello this is Ravish, how are you dear</message_body_text>
      <message_footer>bye!, take care</message_footer>
</message>
</email>

XML DTD
Introduction to DTD
Document Type Definition (DTD), defined to some extent differently within the XML andSGML specifications, is one of several SGML and XML schema languages, and is the term used to describe a document or portion thereof that is authored in the DTDlanguage. A DTD is primarily used for the expression of a schema via a set of declarations that conform to a particular markup syntax and that describe a class, or type, of SGML or XML documents, in terms of constraints on the structure of those documents.
Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list of legal elements and attributes. A DTD can be declared inline inside an XML document, or as an external reference.
Types of DTD
Internal DTD
This is an XML document with a Document Type Definition:
<?xml version="1.0" encoding="ISO-8859-1"?>
     <!DOCTYPE email [
    <!ELEMENT email (sender,recipient,subject,message)>
     <!ELEMENT sender (sender_name,sender_email_id)>
    <!ELEMENT recipient (recipient_name,recipient_email_id)>
    <!ELEMENT subject (#PCDATA)>
    <!ELEMENT message (message_header,message_body_text,message_footer)>
]>
    <email>
        <sender>
            <sender_name>XYZ Kumar</sender_name>
            <sender_email_id>xyz.kumar@gmail.com</sender_email_id>
        </sender>
        <recipient>
            <recipient_name>ABC</recipient_name>
            <recipient_email_id>visu.k@gmail.com</recipient_email_id>
        </recipient>
        <subject>Hello! how r U</subject>
        <message>
            <message_header>Hello dear</message_header>
            <message_body_text>Hello this is xyz, how are you dear</message_body_text>
            <message_footer>bye!, take care</message_footer>
        </message>
</email> 
Click here to view the XML file.
The DTD is interpreted like this:
!ELEMENT email (in line 2) defines the element "email" as having four elements: "senderrecipientsubjectmessage" and so on.
External DTD
This is the same XML document with an external DTD:
<?xml version="1.0" encoding="ISO-8859-1"?>
  <!DOCTYPE email SYSTEM "emailStructure.dtd">
  <email>
    <sender>
    <sender_name>XYZ Kumar</sender_name>
    <sender_email_id>xyz.kumar@gmail.com</sender_email_id>
  </sender>
  <recipient>
    <recipient_name>ABC</recipient_name>
    <recipient_email_id>visu.k@gmail.com</recipient_email_id>
  </recipient>
  <subject>Hello! how r U</subject>
  <message>
    <message_header>Hello dear</message_header>
    <message_body_text>Hello this is xyz, how are you dear</message_body_text>
    <message_footer>bye!, take care</message_footer>
  </message>
  </email>
Click here to view the XML file.
Content of the file "emailStructure.dtd" containing the Document Type Definition:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!ELEMENT email (sender,recipient,subject,message)>
<!ELEMENT sender (sender_name,sender_email_id)>
<!ELEMENT recipient (recipient_name,recipient_email_id)>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT message (message_header,message_body_text,message_footer)>
Click here to view the file.

XML DTD
DTD Elements
The purpose of a DTD is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements. A DTD can be declared inline in your XML document, or as an external reference. XML provides an application independent way of sharing data. With a DTD, independent groups of users can agree to use a common DTD for interchanging data. Your application can use a standard DTDto verify that data that you receive from the outside world is valid. You can also use aDTD to verify your own data.
Declaring an Element
In the DTDXML elements are declared with an element declaration.
An element declaration has the following syntax:
<!ELEMENT element-name (element-content)>
Element with data
<!ELEMENT element-name (#CDATA)>
or
<!ELEMENT element-name (#PCDATA)>
or
<!ELEMENT element-name (ANY)>
#CDATA
means the element contains character data that is not supposed to be parsed by a parser. 
#PCDATA
means that the element contains data that is going to be parsed by a parser. 
ANY  
The keyword ANY declares an element with any content.
If a #PCDATA section contains elements, these elements must also be declared.
example:
<!ELEMENT subject (#PCDATA)>
Empty element
Empty elements are declared with the keyword EMPTY inside the parentheses:
<!ELEMENT element-name (EMPTY)>
example:
<!ELEMENT break (EMPTY)>
Elements with sub-element (sequences)
Elements with one or more children are defined with the name of the children elements inside the parentheses:
<!ELEMENT element-name (child-element-name)>
or
<!ELEMENT element-name (child-element-name,child-element-name,.....)>
example:
<!ELEMENT email (sender,recipient,subject,message)>
In a full declaration, the children must also be declared, and the children can also have children. When children are declared in a sequence separated by commas, the children must appear in the same sequence in the document, this means they should follow the same sequence as they are declered.
The full declaration of the note document will be:
<!ELEMENT email (sender,recipient,subject,message)>
<!ELEMENT sender (sender_name,sender_email_id)>
<!ELEMENT recipient (recipient_name,recipient_email_id)>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT message (message_header,message_body_text,message_footer)>
Declaring only one occurrence of the same element
<!ELEMENT element-name (child-name)>
example:
<!ELEMENT email (message_text)>
The example declaration above declares that the child element message_text can only occur one time inside the email element.
Declaring minimum one occurrence of the same element
<!ELEMENT element-name (child-name+)>
example:
<!ELEMENT email (message_text+)>
The + sign in the example above declares that the child element message_text must occur one or more times inside the email element.
Zero or more occurrences of the same element
<!ELEMENT element-name (child-name*)>
example:
<!ELEMENT email (message_text*)>
The * sign in the example above declares that the child element message_text must occur one or more times inside the email element.
Zero or one occurrences of the same element
<!ELEMENT element-name (child-name?)>
example:
<!ELEMENT email (message_text?)>
The ? sign in the example above declares that the child element message_text can occur zero or one times inside the email element.
mixed content
example:
<!ELEMENT email (sender,recipient+,subject?,message*)>
The example above declares that the element note must contain at least one recipient child element, exactly one sender child element, zero or one subject, zero or more message, and some other parsed character data as well.

XML DTD
DTD Attributes
In the DTDXML element attributes are defined with an ATTLIST declaration.
An attribute declaration has the following syntax:
<!ATTLIST element-name attribute-name attribute-type default-value>
Tthe ATTLIST declaration defines the element which can have the attribute, attribute-name is the name of the attribute, attribute-type is the type of the attribute, and the default attribute value.
The attribute-type can have the following values:
ValueExplanation
xml:The value is predefined
IDThe value is an unique id
IDREFThe value is the id of another element
IDREFSThe value is a list of other ids
CDATAThe value is character data
NMTOKENThe value is a valid XML name
NMTOKENSThe value is a list of valid XML names
ENTITYThe value is an entity
ENTITIESThe value is a list of entities
NOTATIONThe value is a name of a notation
(eval|eval|..)The value must be an enumerated value
The default-value can have the following values:
ValueExplanation
#DEFAULT valueThe attribute has a default value
#REQUIREDThe attribute value must be included in the element
#IMPLIEDThe attribute does not have to be included
#FIXED valueThe attribute value is fixed
Attribute declaration example:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE email [
  <!ELEMENT email (sender,recipient,subject,message)>
   <!ATTLIST email priority CDATA "normal">
]>
<email priority='high'>
</email>
In the above example, the element email is defined to be an empty element with the attributes propriety of type CDATA. The priority attribute has a default value of “Normal”.
Click here to view the example.
Enumerated attribute values
Syntax:
<!ATTLIST element-name attribute-name (eval|eval|..) default-value>
DTD example:
<!ATTLIST payment mode (check|cash) "cash">  
XML example:
<payment mode="check">
or
<payment mode="cash"> 
Use enumerated attribute values when you want the attribute values to be one of a fixed set of legal values.
Fixed attribute value
<?xml version="1.0" encoding="UTF-7"?>
<!DOCTYPE ebiz [
   <!ELEMENT ebiz (associate_details,emp_details,sponser_details)>
   <!ATTLIST ebiz companysite CDATA #FIXED "http://www.ebizel.com">
]>
<ebiz>
</ebiz>
Use a fixed attribute value when you want an attribute to have a fixed value that the author can not change. If an author includes another value, the XML parser will return an error.
Required attribute
<!ELEMENT payment (amount+,duration+)>
     <!ATTLIST payment mode CDATA #REQUIRED>
XML example:
<payment mode="cash"></payment>
Use a required attribute if you don't have an option for a default value, but still want to force the attribute to be present. The required (#REQUIRED) attribute force the document author to use the attribute.
Implied attribute
<!ELEMENT addressdetails (address+,city?,state,pin)>
     <!ATTLIST addressdetails communication_mode CDATA #IMPLIED>
      <addressdetails communication_mode="post">
      </addressdetails>
Use an implied attribute if you don't want to force the author to include an attribute and you don't have an option for a default value either.
Click here to view the example.

XML DTD
DTD Entities
XML Entities
• Entity references are references to entities.
• Entities as variables used to define shortcuts to common text.
• Entities can be declared internal.
• Entities can be declared external
Declaring an Internal Entity
Syntax:
<!ENTITY entity-name "entity-value">
DTD Example:
<!ENTITY developer "Ravish@ebizel.com">
<!ENTITY copyright " Copyright 2007 eBIZ.com Pvt. Ltd. All Rights Reserved ">
XML example:
<author>&developer;©right;</author>
Declaring an External Entity
<!ENTITY entity-name SYSTEM "URI/URL">
DTD Example:
<!ENTITY developer SYSTEM "http://education.ebizel.com/html/xml/entity.xml">
<!ENTITY copyright SYSTEM " http://education.ebizel.com/html/xml/entity.xml ">
XML example:
<author>&developer;©right;</author>
Click here to view the example.

0 comments: