PC TRAINING: XML Schema

XML Schema

The definition of an XML document, which includes the XML tags and their interrelationships. Residing within the document itself, an XML schema may be used to verify the integrity of the content.

Various recommendations for an XML schema were submitted to theW3C, and a standard was approved in May 2001 that included the ability to define data by type (date, integer, etc.).

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntax constraints imposed by XML itself.

Through a process called time-sharing, a large computer can handle interaction with hundreds of users simultaneously, giving each the perception of being the sole user.

An XML schema provides a view of the document type at a relatively high level of abstraction.

XML Schema and Java

Introduction to SAX

Simple API for XML is a programming interface (API) for accessing the contents of anXML document. SAX does not provide a random access lookup to the document's contents. It scans the document sequentially and presents each item to the application only one time.

If the application does not save the data, it is no longer available. In contrast, theDocument Object Model (DOM) converts the document's contents into a node tree that can be traversed back and forth via the programming interface (API). Both SAXand DOM are popular APIs for manipulating XML documents.

What is SAX?

The Simple API for XML (SAX) is a serial access parser API for XML. SAX provides a mechanism for reading data from an XML document. It is a popular alternative to theDocument Object Model (DOM).

The Simple API for XML, SAX, was invented in late 1997/early 1998 when Peter Murray-Rust and several authors of XML parsers written in Java decided there wasn’t much point to maintaining multiple similar yet incompatible APIs to do exactly the same thing. Murray-Rust was the first to suggest what he called “YAXPAPI”.

The reason Murray-Rust wanted Yet Another XML Parser API was that he was thoroughly sick of supporting multiple, incompatible XML parsers for his parser-client application JUMBO. Instead, he wanted a standard API everyone could agree on.

Parser authors Tim Bray and David Megginson quickly signed on to the project, and work began in public on the xml-dev mailing list where many people participated. Megginson wrote the initial draft of SAX. After a short beta period, SAX 1.0 was released on May 11, 1998.

SAX was designed around abstract interfaces rather than concrete classes so it could be layered on top of parsers’ existing native APIs. SAX is not the most sophisticated XMLAPI imaginable, but that’s part of its beauty.

The ease with which SAX could be implemented by many parser vendors with very different architectures contributed to its success and rapid standardization.

XML Processing with SAX

A parser which implements SAX (ie, a SAX Parser) functions as a stream parser, with an event-driven API. The user defines a number of callback methods that will be called when events occur during parsing.

The SAX events include:

* XML Text nodes

* XML Element nodes

* XML PI’s & XML Comments

Events are fired when each of these XML features are encountered, and again when the end of them is encountered. XML attributes are provided as part of the data passed to element events.

SAX parsing is unidirectional; previously parsed data cannot be re-read without starting the parsing operation again.

Definition

Unlike DOM, there is no formal specification for SAX. The Java implementation of SAXis considered to be normative, and implementations in other languages attempt to follow the rules laid down in that implementation, adjusting for the differences in language where necessary.

Benefits

SAX parsers have certain benefits over DOM-style parsers. The quantity of memory that a SAX parser must use in order to function is typically much smaller than that of aDOM parser.

DOM parsers must have the entire tree in memory before any processing can begin, so the amount of memory used by a DOM parser depends entirely on the size of the input data.

The memory footprint of a SAX parser, by contrast, is based only on the maximum depth of the XML file (the maximum depth of the XML tree) and the maximum data stored in XML attributes on a single XML element. Both of these are always smaller than the size of the parsed tree itself.

Because of the event-driven nature of SAX, processing documents can often be faster than DOM-style parsers. Memory allocation takes time, so the larger memory footprint of the DOM is also a performance issue.

Due to the nature of SAX, streamed reading from disk is possible. Processing XMLdocuments that could never fit into memory is only possible through the use of a SAXparser (or another kind of stream XML parser).

Drawbacks

The event-driven model of SAX is useful for XML parsing, but it does have certain drawbacks.

Certain kinds of XML validation require access to the document in full. For example, aDTD IDREF attribute requires that there be an element in the document that uses the given string as a DTD ID attribute.

To validate this in a SAX parser, one would need to keep track of every previously encountered ID attribute and every previously encountered IDREF attribute, to see if any matches are made.

Furthermore, if an IDREF does not match an ID, the user only discovers this after the document has been parsed; if this linkage was important to the building functioning output, then time has been wasted in processing the entire document only to throw it away.

Additionally, some kinds of XML processing simply require having access to the entire document. XSLT and XPath, for example, need to be able to access any node at any time in the parsed XML tree. While a SAX parser could be used to construct such a tree, the DOM already does so by design.

Although SAX is very much a de facto standard, it has not gone through any formal standardization process. Its development was open to anyone interested. All you had to do was join the xml-dev mailing list and participate in the discussions.

The end result was explicitly placed in the public domain. It is free to be implemented or extended by anyone for any purpose without permission from anybody. It is not copyrighted or trademarked. As far as is known, no parts of it are patented by anyone either.

XML Schema and Java

Getting the Java XML Parser

List of XML Parser/ Parser Tools

Saxon

The SAXON package is a collection of tools for processing XML documents. You can useSAXON by writing XSL stylesheets, by writing Java applications, or by any combination of the two.

The output format may be XML, or HTML, or some other format such as comma separated values, EDI messages, or data in a relational database. Maintained by Michael Kay. It can be relied upon to support the latest standards.

Download link: http://saxon.sourceforge.net/

Instant Saxon

Instant Saxon contains identical functionality to the full product, but packaged as a Windows executable for ease of installation and running.

This package includes only basic documentation, and no source code or sample applications.

Home Page: http://saxon.sourceforge.net/saxon6.5.2/instant.php

Download Link: http://saxon.sourceforge.net/

Xalan

Xalan is an XSLT processor for transforming XML documents into HTML, text, or otherXML document types. Xalan-Java version 1.2.2 is a complete and robust implementation of the W3C Recommendations for XSL Transformations (XSLT) and the XML Path Language (XPath).

Xalan can be used from the command line, in an applet or a servlet, or as a module in other program. By default, it uses the Xerces XML parser, but it can interface to anyXML parser that conforms to the DOM level 2 or SAX level 1 specification.

Xalan Home page: http://xml.apache.org/xalan-j/

Download Link : http://xml.apache.org/xalan-j/downloads.php

Xerces [My favorite and the best]

Xerces (named after the Xerces Blue butterfly) provides world-class XML parsing and generation. Fully-validating parsers are available for both Java and C++, implementing the W3C XML and DOM (Level 1 and 2) standards, as well as the de facto SAX (version 2) standard.

The parsers are highly modular and configurable. Initial support for XML Schema (draftW3C standard) is also provided. A Perl wrapper is provided for the C++ version of Xerces, which allows access to a fully validating DOM XML parser from Perl.

It also provides for full access to Unicode strings, since Unicode is a key part of the XMLstandard. A COM wrapper (also for Xerces-C) provides compatibility with the MicrosoftMSXML parser.

Xerces Home Page: http://xerces.apache.org/

Download Link: http://xerces.apache.org/xerces2-j/download.cgi

Oracle XSL

Oracle provides a set of XML parsers for Java, C, C++, and PL/SQL. Each of these parsers is a stand-alone XML component that parses an XML document (or a standalone DTD) so that it can be processed by an application.

The parsers support the DOM (Document Object Model) and SAX (Simple API for XML) interfaces, XML Namespaces, validating and non-validating modes, and XSLtransformations. The parsers are available on all Oracle platforms.

Sablotron

Sablotron is a fast, compact and portable XSLT processor. Sablotron is an open project; other users and developers are encouraged to use it or to help us testing or improving it.

The goal of this project is to create a reliable and fast XSLT processor conforming to theW3C specification, which is available for public and can be used as a base for multi-platform XML applications.

Download Link: http://search.cpan.org/~pavelh/XML-Sablotron-1.01/Sablotron.pm

XT is an implementation in Java of XSL Transformations. Download Load: http://www.blnz.com/xt

XP is an XML 1.0 parser written in Java. It is fully conforming: it detects all non well-formed documents. It is currently not a validating XML processor.

However it can parse all external entities: external DTD subsets, external parameter entities and external general entities.

Download Link: http://www.jclark.com/xml/xp/index.php

Unicorn

Unicorn XML Toolkit is a developer product implementing various XML-enabling technologies. The Toolkit implements two sets of API: one for C++ and and one forECMAScript.

Download Link: http://www.unicorn-enterprises.com/

4XSLT (4Suite)

4Suite is a collection of Python tools for XML processing and object database management. It provides support for XML parsing, several transient and persistentDOM implementations, XPath expressions, XPointer, XSLT transforms, XLink, RDF andODMG object databases.

The quickest path to trying 4Suite out, especially for non-Python users, is to follow the 4Suite Installation HOW TO, which is available for UNIX and Windows users.

Download Link: http://cvs.4suite.org/viewcvs/4Suite/Ft/Xml/

Napa

Napa is a high-performance, progressive, C++ XSLT processor. There are now three distributions available, Windows, FreeBSD and Linux. All just provide a command line interface at the moment.

XML Schema and Java

Choosing the suitable XML API

There are two major standard APIs for processing XML documents with Java, the Simple API for XML (SAX) and the Document Object Model (DOM), each of which comes in several versions.

In addition there are a host of other, somewhat idiosyncratic APIs including JDOM, dom4j, ElectricXML, and XMLPULL. Finally each specific parser generally has a native API that it exposes below the level of the standard APIs.

For instance, the Xerces parser has the Xerces Native Interface (XNI).

However, picking such an API limits your choice of parser, and indeed may even tie you to one particular version of the parser since parser vendors tend not to worry a great deal about maintaining native compatibility between releases. Each of these APIs has its own strengths and weaknesses.

SAX

SAX, the Simple API for XML, is the gold standard of XML APIs. It is the most complete and correct by far. Given a fully validating parser that supports all its optional features, there is very little you can’t do with it.

It has one or two holes, but they're really off in the weeds of the XML specifications, and you have to look pretty hard to find them. SAX is a event driven API. The SAXclasses and interfaces model the parser, the stream from which the document is read, and the client application receiving data from the parser.

However, no class models the XML document itself. Instead the parser feeds content to the client application through a callback interface, much like the ones used in Swing and the AWT.

This makes SAX very fast and very memory efficient (since it doesn’t have to store the entire document in memory). However, SAX programs can be harder to design and code because you normally need to develop your own data structures to hold the content from the document.

SAX works best when your processing is fairly local; that is, when all the information you need to use is close together in the document. For example, you might process one element at a time.

Applications that require access to the entire document at once in order to take useful action would be better served by one of the tree-based APIs like DOM or JDOM.

Finally, because SAX is so efficient, it’s the only real choice for truly huge XMLdocuments. Of course, “truly huge” has to be defined relative to available memory. However, if the documents you're processing are in the gigabyte/100 MB’S range, you really have no choice but to use SAX.

DOM

DOM, the Document Object Model, is a fairly complex API that models an XMLdocument as a tree. Unlike SAX, DOM is a read-write API. It can both parse existingXML documents and create new ones. Each XML document is represented as Document object.

Documents are searched, queried, and updated by invoking methods on this Document object and the objects it contains. This makes DOM much more convenient when random access to widely separated parts of the original document is required.

However, it is quite memory intensive compared to SAX, and not nearly as well suited to streaming applications.

JAXP

JAXP, the Java API for XML Processing, bundles SAX and DOM together along with some factory classes and the TrAX XSLT API. (TrAX is not a general purpose XML APIlike SAX and DOM.

I'll get to it in Chapter 17.) It is a standard part of Java 1.4 and later. However, it is not really a different API. When starting a new program, you ask yourself whether you should choose SAX or DOM.

You don’t ask yourself whether you should use SAX or JAXP, or DOM or JAXP. SAXand DOM are part of JAXP.

JDOM

JDOM is a Java-native tree-based API that attempts to remove a lot of DOM’s ugliness. The JDOM mission statement is, “There is no compelling reason for a Java API to manipulate XML to be complex, tricky, unintuitive, or a pain in the neck,” and for the most part JDOM delivers.

Like DOM, JDOM reads the entire document into memory before it begins to work on it; and the broad outline of JDOM programs tends to be the same as for DOM programs. However, the low-level code is a lot less tricky and ugly than the DOM equivalent.

JDOM uses concrete classes and constructors rather than interfaces and factory methods. It uses standard Java coding conventions, methods, and classes throughout.JDOM programs often flow a lot more naturally than the equivalent DOM program.

I think JDOM often does make the easy problems easier; but in my experience JDOMalso makes the hard problems harder. Its design shows a very solid understanding of Java, but the XML side of the equation feels much rougher.

It’s missing some crucial pieces like a common node interface or superclass for navigation. JDOM works well (and much better than DOM) on fairly simple documents with no recursion, limited mixed content, and a well-known vocabulary.

It begins to show some weakness when asked to process arbitrary XML. When I need to write programs that operate on any XML document, I tend to find DOM simpler despite its ugliness.

dom4j

dom4j was forked from the JDOM project fairly early on. Like JDOM, it is a Java-native, tree-based, read-write API for processing generic XML. However, it uses interfaces and factory methods rather than concrete classes and constructors.

This gives you the ability to plug in your own node classes that put XML veneers on other forms of data such as objects or database records. (In theory, you could do this with DOM interfaces too; but in practice most DOM implementations are too tightly coupled to interoperate with each other’s classes.) It does have a generic node type that can be used for navigation.

SAX vs. DOM

Why they were both built

SAX (Simple API for XML) and DOM (Document Object Model) were both designed to allow programmers to access their information without having to write a parser in their programming language of choice.

By keeping the information in XML 1.0 format, and by using either SAX or DOM APIs your program is free to use whatever parser it wishes. This can happen because parser writers must implement the SAX and DOM APIs using their favorite programming language.

SAX and DOM APIs are both available for multiple languages (Java, C++, Perl, Python, etc.).

So both SAX and DOM were created to serve the same purpose, which is giving you access to the information stored in XML documents using any programming language (and a parser for that language).

However, both of them take very different approaches to giving you access to your information.

What is DOM?

DOM gives you access to the information stored in your XML document as a hierarchical object model.

DOM creates a tree of nodes (based on the structure and information in your XML document) and you can access your information by interacting with this tree of nodes.

The textual information in your XML document gets turned into a bunch of tree nodes.

Regardless of the kind of information in your XML document (whether it is tabular data, or a list of items, or just a document), DOM creates a tree of nodes when you create a Document object given the XML document.

Thus DOM forces you to use a tree model (just like a Swing TreeModel) to access the information in your XML document. This works out really well because XML is hierarchical in nature. This is why DOM can put all your information in a tree (even if the information is actually tabular or a simple list).

In DOM, each element node actually contains a list of other nodes as its children. These children nodes might contain text values or they might be other element nodes.

At first glance, it might seem unnecessary to access the value of an element node (e.g.: in "<name> XYZ</name>", XYZ is the value) by looking through a list of children nodes inside of it. If each element only had one value then this would truly be unnecessary.

However, elements may contain text data and other elements; this is why you have to do extra work in DOM just to get the value of an element node. Usually when pure data is contained in your XML document, it might be appropriate to "lump" all your data in one String and have DOM return that String as the value of a given element node.

This does not work so well if the data stored in your XML document is a document (like a Word or Framemaker document). In documents, the sequence of elements is very important. For pure data (like a database table) the sequence of elements does not matter. So DOM preserves the sequence of the elements that it reads from XMLdocuments, because it treats everything as it if were a document. Hence the nameDOCUMENT object model.

If you plan to use DOM as the Java object model for the information stored in your XML document then you really don't need to worry about SAX. However, if you find that DOM is not a good object model to use for the information stored in your XML document then you might want to take a look at SAX.

It is very natural to use SAX in cases where you have to create your own CUSTOMobject models. To make matters a little more confusing, you can also create your object model(s) on top of DOM. OOP is a wonderful thing.

What is SAX?

SAX chooses to give you access to the information in your XML document, not as a tree of nodes, but as a sequence of events! You ask, how is this useful? The answer is thatSAX chooses not to create a default Java object model on top of your XML document (like DOM does). This makes SAX faster, and also necessitates the following things:;

* creation of your own custom object model

* creation of a class that listens to SAX events and properly creates your object model.

Note that these steps are not necessary with DOM, because DOM already creates an object model for you (which represents your information as a tree of nodes).

In the case of DOM, the parser does almost everything, read the XML document in, create a Java object model on top of it and then give you a reference to this object model (a Document object) so that you can manipulate it.

SAX is not called the Simple API for XML for nothing, it is really simple. SAX doesn't expect the parser to do much, all SAX requires is that the parser should read in theXML document, and fire a bunch of events depending on what tags it encounters in theXML document.

You are responsible for interpreting these events by writing an XML document handler class, which is responsible for making sense of all the tag events and creating objects in your own object model. So you have to write:

* your custom object model to "hold" all the information in your XML document into

* a document handler that listens to SAX events (which are generated by the SAX parser as its reading your XML document) and makes sense of these events to create objects in your custom object model.

SAX can be really fast at runtime if your object model is simple. In this case, it is faster than DOM, because it bypasses the creation of a tree based object model of your information. On the other hand, you do have to write a SAX document handler to interpret all the SAX events (which can be a lot of work).

What kinds of SAX events are fired by the SAX parser? These events are really very simple. SAX will fire an event for every open tag, and every close tag. It also fires events for #PCDATA and CDATA sections.

You document handler (which is a listener for these events) has to interpret these events in some meaningful way and create your custom object model based on them.

Your document handler will have to interpret these events and the sequence in which these events are fired is very important. SAX also fires events for processing instructions, DTDs, comments, etc. But the idea is still the same, your handler has to interpret these events (and the sequence of the events) and make sense out of them.

When to use DOM

If your XML documents contain document data (e.g., Framemaker documents stored in XML format), then DOM is a completely natural fit for your solution. If you are creating some sort of document information management system, then you will probably have to deal with a lot of document data.

An example of this is the Datachannel RIO product, which can index and organize information that comes from all kinds of document sources (like Word and Excel files). In this case, DOM is well suited to allow programs access to information stored in these documents.

However, if you are dealing mostly with structured data (the equivalent of serialized Java objects in XML) DOM is not the best choice. That is when SAX might be a better fit.

When to use SAX

If the information stored in your XML documents is machine readable (and generated) data then SAX is the right API for giving your programs access to this information. Machine readable and generated data include things like:

* Java object properties stored in XML format

* queries that are formulated using some kind of text based query language (SQL, XQL,OQL)

* result sets that are generated based on queries (this might include data in relational database tables encoded into XML).

So machine generated data is information that you normally have to create data structures and classes for in Java. A simple example is the address book which contains information about persons.

This address book XML file is not like a word processor document, rather it is a document that contains pure data, which has been encoded into text using XML.

<?xml version="1.0" encoding="UTF-8"?>
<ebiz xmlns="http://www.ebizel.com" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:SchemaLocation="http://www.ebizel.com  xs-any.xsd 
    http://www.ebizel.com xs-any1.xsd">
    <details>
        <name>
         <first-Name></first-Name>
        <last-Name></last-Name>
      </name>
     <address></address>
    <dob>12-10-1984</dob>
  </details>
</ebiz>

When your data is of this kind, you have to create your own data structures and classes (object models) anyway in order to manage, manipulate and persist this data.

SAX allows you to quickly create a handler class which can create instances of your object models based on the data stored in your XML documents. An example is a SAXdocument handler that reads an XML document that contains my address book and creates an AddressBook class that can be used to access this information.

The first SAX tutorial shows you how to do this. The address book XML document contains person elements, which contain name and email elements. My AddressBook object model contains the following classes:

* AddressBook class, which is a container for Person objects

* Person class, which is a container for name and email String objects.

So my "SAX address book document handler" is responsible for turning person elements into Person objects, and then storing them all in an AddressBook object. This document handler turns the name and email elements into String objects.

Conclusion

So it depends upon the requirement whether to use SAX or DOM. After the above discussion, now you better understand which XML is best suited for what kind of requirement.

If you have small amount of data to be processed and analyze, then go with the DOM API, but if you have really huge amount of data to parse, process and analyze and you want something that is fast, then SAX is the obvious choice.

Examples in this tutorial however use both [SAX & DOM] but, extensive coverage of these topics, I don’t think, is possible in one or two chapters. Complete description of both is above the scope of this tutorial. It only covers some basic validation exercises.

XML Schema and Java

Parsing XML with SAX

Parsing is the process of reading an XML document and reporting its content to a client application while checking the document for well-formedness. SAX represents parsers as instances of the XMLReader interface.

The specific class that implements this interface varies from parser to parser. For example, in Xerces it’s org.apache.xerces.parsers.SAXParser.

In Crimson it’s org.apache.crimson.parser.XMLReaderImpl. Most of the time you don’t construct instances of this interface directly. Instead you use the static XMLReaderFactory.createXMLReader() factory method to create a parser-specific instance of this class.

Then you pass InputSource objects containing XML documents to the parse() method ofXMLReader. The parser reads the document, and throws an exception if it detects any well-formedness errors.

The example below demonstrates the complete process with a simple program whosemain() method parses a document found at a URL entered on the command line.

If this document is well-formed, a simple message to that effect is printed on System.out. Otherwise, if the document is not well-formed, the parser throws aSAXException.

If an I/O error such as a file could not be found, then the parse() method throws an IOException. In this case, you don’t know whether or not the document is well-formed.

Source code for XMLParsing.java

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
class  XMLParsing 
{	public static void main(String[] args) 
	{
	java.io.File file=null;
	java.io.BufferedReader infile=null;
	if (args.length<=0)
	{
	System.out.println("Incomplete Syntax\n usage :
	 java XMLParsing file.xml\n here file.xml is file that you want to parse");
	javax.swing.JOptionPane.showMessageDialog(null,
	"Incomplete Syntax\n usage : java XMLParsing file.xml\n
	 here file.xml is file that you want to parse");
	System.exit(0);
		
	}
	try
	{
	org.xml.sax.XMLReader parser=org.xml.sax.helpers.
	XMLReaderFactory.createXMLReader();
	parser.parse(args[0]);
	file=new java.io.File(args[0]);
	infile=new java.io.BufferedReader(new 
	java.io.FileReader(file));
	System.out.println("Valid XML Document");
	StringBuffer str=new StringBuffer();
	String line=null;
	str.append("Content of the XML file");
	while ((line=infile.readLine())!=null)
	{
	str.append(System.getProperty("line.separator"));
	str.append(line);
	str.append(System.getProperty("line.separator"));
	}
	System.out.println(str);
	}
	catch (org.xml.sax.SAXException se)
	{
	System.out.println("Document is not a valid XML Parser says :"+se);
	}
	catch(java.io.IOException e)
	{
	System.out.println("Parser could not load the document"+e);
  }

 }
}

Source code of email.xml

<?xml version="1.0" encoding="UTF-8"?>
<email xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:noNamespaceSchemaLocation="xsmixedType.xsd">
  <sender>
  <name>fasdfasd dfasfasdfasd  </name>
  <email>ra.kumar@ebizel.com</email>
  </sender>
  <recipient>
  <name>sdfasf sdf </name>
  <email>someone@email.com</email>
  </recipient>
  <subject></subject>
  <message>
  Wish u a very Very Happy Friendship day
      <banner>dasfa</banner>adf
   <highlights>10 year 
   of our friendship</highlights>
  <msgContent>How 
  r u?</msgContent> I miss u a lot
 </message>
</email>

Compile the program

Use the following command to compile the program:

javac XMLParsing.java

if you are having problem compiling the program, then please check that you have sax api and classpath variable is set properly[ this example usage xerces.jar ].

If you have not included xerces.jar to the classpath then use the following command to compile the program:

\>javac –classpath “locationof xerces/ xerces.jar” XMLParsing.java

Running the example:

Use the following command to run the example:

java XMLParsing email.xml
or
java –classpath “locationof xerces/ xerces.jar” XMLParsing

Output:

XML Schema and Java

Validating XML with Java

The XML parser world is a dynamic one. As standards change, the parsers change as well--XML parsers are becoming more sophisticated. For most programming projects, the parser, at minimum, must support DOM Level 2, SAX 2, XSLT, and Namespaces. All the parsers discussed here provide these capabilities; however, there are distinct differences in performance, reliability, and conformance to standards.

Parsing XML with DOM

Xerces has full support for the W3C Document Object Model (DOM) Level 1 and the Simple API for XML (SAX) 1.0 and 2.0; however it currently has only limited support for XML Schemas, DOM Level 2 (version 1). Add the xerces.jar file to your CLASSPATHto use the parser.

You can use Xalan, also available from Apache's Web site, for XSLT processing. You can configure both the DOM and SAX parsers. Xerces uses the SAX2 method getFeature()and setFeature() to query and set various parser features. For example, to create a validating DOM parser instance, you would write:

DOMParser domp = new DOMParser();
   try {
      domp.setFeature 
	  ("http://xml.org/dom/features/validation", true);
   } catch (SAXExcepton ex) {
      System.out.println(ex);
   }

Other modifiable features include support for Schemas and namespaces.

The following example shows a minimal program that counts the number of<employee_details> tags in an XML file using the DOM. The second import line specifically refers to the Xerces parser.

The main method creates a new DOMParser instance and then invokes its parse() method. If the parse operation succeeds, you can retrieve a Document object through which you can access and manipulate the DOM tree using standard DOM API calls.

Source code of MyXMLDOM.java

import org.w3c.dom.*;
import org.apache.xerces.parsers.DOMParser;
/**
*  XML DOM Parser  
*/
public class MyXMLDOM
{
public static void main(String[] args) 
{
if (args.length<=0)
{
System.out.println("Incomplete Call
 to parser \n Usage : java MyXMLDOM file.xml ");
return;
   }
try {
      
DOMParser parser = new DOMParser();
parser.parse(args[0]);
Document doc = parser.getDocument();
NodeList nodes = 
doc.getElementsByTagName(“employee_details”);
System.out.println("There are " + nodes.getLength
() +               "  elements.");
catch(java.io.IOException e)
{
System.out.println
("Could not Read the source XML file : "+args[0]);
}
 catch (Exception ex) {
 System.out.println("The Document is not 
 a Valid XML Document \n Parser Says : "+ex);
    }
  }
}

Compiling the example:

javac MyXMLDOM.java

Running the example:

java MyXMLDOM eBIZ_com_staff.xml

Here eBIZ_com_staff.xml is the XML file that we want to parse.

Output

Parsing XML with SAX

You can use SAX to accomplish the same task. SAX is event-oriented. In the following example, inherits from DefaultHandler, which has default implementations for all theSAX event handlers, and overrides two methods: startElement() andendDocument().

The parser calls the startElement() method each time it encounters a new element in the XML file. In the overridden startElement method, the code checks for the "employee_details" tag, and increments the elementCount counter variable.. When the parser reaches the end of the XML file, it calls the endDocument() method.

The code prints out the counter variable at that point. Set the ContentHandler and the ErrorHandler properties of the the SAXParser(), and then use the parse() method to start the actual parsing.

Source code of SAXParsing.java

import org.xml.sax.*;
import org.xml.sax.helpers.*;
import org.apache.xerces.parsers.*;
import java.io.*;
public class  SAXParsing extends DefaultHandler 
{
	java.io.File file=null;
	java.io.BufferedReader infile=null;
	public SAXParsing(String args)
	{
	try
	{

	//Instance of  Apache SAX Parser 
	SAXParser parser1=new SAXParser();
		
	parser1.setContentHandler(this);
	parser1.setErrorHandler(this);
parser1.parse(args);
file=new java.io.File(args);
Snfile=new java.io.BufferedReader(new java.io.FileReader(file));
System.out.println("Valid XML Document");
StringBuffer str=new StringBuffer();
String line=null;
str.append("Content of the XML file");
while ((line=infile.readLine())!=null)
{
str.append(System.getProperty("line.separator"));
str.append(line);
str.append(System.getProperty("line.separator"));
}
System.out.println(str);
}
catch (org.xml.sax.SAXException se)
{
System.out.println("Document is not a valid XML Parser says :"+se);
}
catch(java.io.IOException e)
{
System.out.println("Parser could not load the document"+e);
}

}
private int elementCount;
public void startElement
(String uri,String localName,String rawName,Attributes attributes)
{
  if(rawName.equals("employee_details"))
 {
   elementCount++;
  }
}
public void endDocument()
{
 System.out.println("there are "+elementCount+" 
employee records in the document"); 
}
public String parseXML(String args)
{
  return "";
}
public static void main(String[] args) 
{
		
if (args.length<=0)
{
System.out.println("Incomplete Syntax\n usage :
 java SAXParsing file.xml\n here file.xml is file that you want to parse");
javax.swing.JOptionPane.showMessageDialog
(null, "Incomplete Syntax\n usage : java SAXParsing 
file.xml\n here file.xml is file that you want to parse");
System.exit(0);
}

new SAXParsing(args[0]);
		
 }
}

Compiling the example

javac SAXParing.java

Running the example:

java SAXParsing eBIZ_com_staff.xml

Output of the program:

Source code of the eBIZ_com_staff.xml

<?xml version="1.0" encoding="iso-8859-1"?>
<ebiz >
	<employee_details>
		<emp_id>eBIZTECH001</emp_id>
		<fname>Mr Aman Kumar</fname>
		<lname>Singh</lname>
		<department>TECHNICAL[Java]</department>
		<designation>Sr. Developer</designation>
		<phone> 32942</phone>
		<address>Sector 44, Noida</address>
		<age>27</age>
	</employee_details>
	<employee_details>
		<emp_id>eBIZCSS001</emp_id>
		<fname>Mrs Sunita</fname>
		<lname>Singhania</lname>
		<department>Customer Support[Web Assist]</department>
		<designation>CCE</designation>
		<phone> 000000</phone>
		<address>Sector 66, Gurgaon</address>
		<age>20</age>
	</employee_details>
	<employee_details>
		<emp_id>eBIZAC001</emp_id>
		<fname>Miss Amisha</fname>
		<lname>Mishra</lname>
		<department>Accounts</department>
		<designation>Jr. Accountant</designation>
		<phone> 32942</phone>
		<address>Sector 66, Rohni</address>
		<age>26</age>
	</employee_details>
	<employee_details>
		<emp_id>eBIZAC002</emp_id>
		<fname>ANITA</fname>
		<lname>Mishra</lname>
		<department>Accounts</department>
		<designation>Jr. Accountant</designation>
		<phone> 09999 </phone>
		<address>Sector 66, Noida</address>
		<age>23</age>
	</employee_details>
	<employee_details>
		<emp_id>eBIZAC003</emp_id>
		<fname>ANKITA</fname>
		<lname>DUBEY</lname>
		<department>Accounts</department>
		<designation>Jr. Accountant</designation>
		<phone>032942 </phone>
		<address>Sector 99, Noida</address>
		<age>21</age>
	</employee_details>
	<employee_details>
		<emp_id>eBIZAC004</emp_id>
		<fname>AKSHITA</fname>
		<lname>Jaiswal </lname>
		<department>TECHNICAL[Java]</department>
		<designation>Jr. DEVELOPER</designation>
		<phone> (+91)99990</phone>
		<address>Sector 66, Noida</address>
		<age>22</age>
	</employee_details>
	<employee_details>
		<emp_id>eBIZTECH002</emp_id>
		<fname>Shweta</fname>
		<lname>Agrwal</lname>
		<department>TECHNICAL[Java]</department>
		<designation>Jr. Programmer</designation>
		<phone> 32942</phone>
		<address>Sector 66, Noida</address>
		<age>24</age>
	</employee_details>
	<employee_details>
		<emp_id>eBIZTECH003</emp_id>
		<fname>Shalu</fname>
		<lname>Jain</lname>
		<department>TECHNICAL[Java]</department>
		<designation>Jr. Programmer</designation>
		<phone> 32942</phone>
		<address>Sector 66, Noida</address>
		<age>23</age>
	</employee_details>
	<employee_details>
		<emp_id>eBIZTECH004</emp_id>
		<fname>Ravish </fname>
		<lname>Tiwari</lname>
		<department>TECHNICAL[Java]</department>
		<designation>Developer</designation>
		<phone>09999</phone>
		<address>Sector 44, Gr Noida</address>
	</employee_details>
	<employee_details>
		<emp_id>eBIZAC005</emp_id>
		<fname>Rekha</fname>
		<lname>Mishra</lname>
		<department>Accounts</department>
		<phone>032942</phone>
		<address>sector 12, noida</address>
		<age>25</age>
	</employee_details>
	<employee_details>
		<emp_id>eBIZTECH005</emp_id>
		<fname>Devesh</fname>
		<lname>Chauhan</lname>
		<department>TECHNICAL[HTML]</department>
		<designation>HTML Developer</designation>
		<phone>032942</phone>
 <address>Sector 12, Noida</address>
</employee_details>
<employee_details/> 
	
</ebiz>

Click here to view the XML file.

XML Schema and Java

XML to JTree

The following example demonstrate how you can create a JTree from and XML file. This example is divided in two parts, XMLTree and JTreeTester. XMLTree parse thexml file and analyze its content using various methods and creates a Tree model based on that data.

JTreeTester create a UI to display the JTree.

Source code of the XMLTree.java

/ W3C DOM classes
import org.w3c.dom.*;

// JAXP's classes for DOM I/O
import javax.xml.parsers.*;

// Standard Java classes
import javax.swing.*;
import javax.swing.tree.*;
import javax.swing.event.*;
import java.awt.*;
import java.awt.event.*;
import java.io.*;

public class XMLTree extends JTree
{
/**
* This member stores the TreeNode object used to create the model for the JTree.
* The DefaultMutableTreeNode class is defined in the javax.swing.tree package
* and provides a default implementation of the MutableTreeNode interface.
*/
private DefaultMutableTreeNode treeNode;
/**
* These three members are a part of the JAXP API and are used to parse the XML
* text into a DOM object (of type Document).
*/
private DocumentBuilderFactory dbf;
private DocumentBuilder db;
private Document doc;
/**
* This single constructor builds an XMLTree object using the XML text
* passed in through the constructor.
*
* @param text A String of XML formatted text
*
* @exception ParserConfigurationException This exception is potentially thrown if
* the constructor configures the parser improperly. It won't.
*/
public XMLTree( String text ) throws ParserConfigurationException
{
// Initialize the superclass portion of the object
super();
// Set basic properties for the Tree rendering
getSelectionModel().setSelectionMode( TreeSelectionModel.SINGLE_TREE_SELECTION );
setShowsRootHandles( true );
setEditable( false ); // A more advanced version of this tool would allow the Tree to be editable
// Begin by initializing the object's DOM parsing objects
dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating( false );
db = dbf.newDocumentBuilder();
// Take the DOM root node and convert it to a Tree model for the JTree
treeNode = createTreeNode( parseXml( text ) );
setModel( new DefaultTreeModel( treeNode ) );
} //end XMLTree()
/**
* This takes a DOM Node and recurses through the children until each one is added
* to a DefaultMutableTreeNode. The JTree then uses this object as a tree model.
*
* @param root org.w3c.Node.Node
*
* @return Returns a DefaultMutableTreeNode object based on the root Node passed in
*/
private DefaultMutableTreeNode createTreeNode( Node root )
{
DefaultMutableTreeNode treeNode = null;
String type, name, value;
NamedNodeMap attribs;
Node attribNode;
// Get data from root node
type = getNodeType( root );
name = root.getNodeName();
value = root.getNodeValue();
// Special case for TEXT_NODE
treeNode = new DefaultMutableTreeNode( root.getNodeType() == Node.TEXT_NODE ? value : name );
// Display the attributes if there are any
attribs = root.getAttributes();
if( attribs != null )
{
for( int i = 0; i < attribs.getLength(); i++ )
{
attribNode = attribs.item(i);
name = attribNode.getNodeName().trim();
value = attribNode.getNodeValue().trim();
if ( value != null )
{
if ( value.length() > 0 )
{
treeNode.add( new DefaultMutableTreeNode( "[Attribute] --> " + name + "=\"" + value + "\"" ) );
} //end if ( value.length() > 0 )
} //end if ( value != null )
} //end for( int i = 0; i < attribs.getLength(); i++ )
} //end if( attribs != null )
// Recurse children nodes if any exist
if( root.hasChildNodes() )
{
NodeList children;
int numChildren;
Node node;
String data;
children = root.getChildNodes();
// Only recurse if Child Nodes are non-null
if( children != null )
{
numChildren = children.getLength();
for (int i=0; i < numChildren; i++)
{
node = children.item(i);
if( node != null )
{
// A special case could be made for each Node type.
if( node.getNodeType() == Node.ELEMENT_NODE )
{
treeNode.add( createTreeNode(node) );
} //end if( node.getNodeType() == Node.ELEMENT_NODE )
data = node.getNodeValue();
if( data != null )
{
data = data.trim();
if ( !data.equals("\n") && !data.equals("\r\n") && data.length() > 0 )
{
treeNode.add(createTreeNode(node));
} //end if ( !data.equals("\n") && !data.equals("\r\n") && data.length() > 0 )
} //end if( data != null )
} //end if( node != null )
} //end for (int i=0; i < numChildren; i++)
} //end if( children != null )
} //end if( root.hasChildNodes() )
return treeNode;
} //end createTreeNode( Node root )
/**
* This method returns a string representing the type of node passed in.
*
* @param node org.w3c.Node.Node
*
* @return Returns a String representing the node type
*/
private String getNodeType( Node node )
{
String type;
switch( node.getNodeType() )
{
case Node.ELEMENT_NODE:
{
type = "Element";
break;
}
case Node.ATTRIBUTE_NODE:
{
type = "Attribute";
break;
}
case Node.TEXT_NODE:
{
type = "Text";
break;
}
case Node.CDATA_SECTION_NODE:
{
type = "CData section";
break;
}
case Node.ENTITY_REFERENCE_NODE:
{
type = "Entity reference";
break;
}
case Node.ENTITY_NODE:
{
type = "Entity";
break;
}
case Node.PROCESSING_INSTRUCTION_NODE:
{
type = "Processing instruction";
break;
}
case Node.COMMENT_NODE:
{
type = "Comment";
break;
}
case Node.DOCUMENT_NODE:
{
type = "Document";
break;
}
case Node.DOCUMENT_TYPE_NODE:
{
type = "Document type";
break;
}
case Node.DOCUMENT_FRAGMENT_NODE:
{
type = "Document fragment";
break;
}
case Node.NOTATION_NODE:
{
type = "Notation";
break;
}
default:
{
type = "???";
break;
}
}// end switch( node.getNodeType() )
return type;
} //end getNodeType()
/**
* This method performs the actual parsing of the XML text
*
* @param text A String representing an XML document
* @return Returns an org.w3c.Node.Node object
*/
private Node parseXml( String text )
{
ByteArrayInputStream byteStream;
byteStream = new ByteArrayInputStream( text.getBytes() );
try
{
doc = db.parse( byteStream );
}
catch ( Exception e )
{
e.printStackTrace();
System.exit(0);
}
return ( Node )doc.getDocumentElement();
} //end parseXml()
} //end class XMLTree

Source code of the JTreeTester.java

// JAXP's classes for DOM I/O
import javax.xml.parsers.*;

// GUI classes
import javax.swing.*;
import java.awt.*;
import java.awt.event.*;

//Standard Java Classes
import java.io.*;

/**
  *
 * @author 
 * @version 1.0
 */

public class JTreeTester extends JFrame
{
 // This is the XMLTree object which displays the XML in a JTree
 private XMLTree               XMLTree;
 // This JScrollPane is the container for the JTree
  private JScrollPane         jScroll;
  // This Listener allows the frame's close button to work properly
  private WindowListener      winClosing;

 // These two constants set the width and height of the frame
  private static final int FRAME_WIDTH = 400;
  private static final int FRAME_HEIGHT = 300;

 /*
 * This constructor builds a frame containing 
 a JScrollPane which in turn contains an XMLTree
 * object based on the XML string passed into the constructor
 */
public JTreeTester( String title, String xml )
 throws ParserConfigurationException
 {
 // This builds the JFrame portion of the object
  super( title );
Toolkit			toolkit;
 Dimension			dim, minimumSize;
int			screenHeight, screenWidth;
 // Initialize basic layout properties
setBackground( Color.lightGray );
getContentPane().setLayout( new BorderLayout() );

// Set the frame's display to be 
WIDTH x HEIGHT in the middle of the screen
  toolkit = Toolkit.getDefaultToolkit();
    dim = toolkit.getScreenSize();
    screenHeight = dim.height;
  screenWidth = dim.width;
 setBounds( (screenWidth-FRAME_WIDTH)/2, 
 (screenHeight-FRAME_HEIGHT)/2, FRAME_WIDTH, FRAME_HEIGHT );

  // Build the XMLTree object
  XMLTree = new XMLTree( xml );

  // Wrap the XMLTree in a JScroll so that we can scroll it in the JFrame.
  jScroll = new JScrollPane();
  jScroll.getViewport().add( XMLTree );

   // Add the scroll pane to the frame
  getContentPane().add( jScroll, BorderLayout.WEST );
   //Put the final touches to the JFrame object
   validate();
  setVisible(true);
 // Add a WindowListener so that we can close the window
  winClosing = new WindowAdapter()
  {
  public void windowClosing(WindowEvent e)
   {
    exit();
   }
 };
 addWindowListener(winClosing);
 } //end JTreeTester()

 // Program execution begins here.
   An XML file (*.xml) must be passed into the method
 Public static void main( String[] args )
 {
 String                    fileName = "";
  BufferedReader            reader;
  String                    line;
  StringBuffer              xmlText;
 JTreeTester               JTreeTester;

 // Build a Document object based on the specified XML file
	  try
 {
 if( args.length > 0 )
 {
 fileName = args[0];
 if ( fileName.substring( fileName.indexOf
 ( '.' ) ).equals( ".xml" ) )
  {
 reader = new BufferedReader
 ( new FileReader( fileName ) );
 xmlText = new StringBuffer();

 while ( ( line = reader.readLine() ) != null )
   {
  xmlText.append( line );
 } //end while ( ( line = reader.readLine() ) != null )
// The file will have to be re-read when the Document object is parsed
 reader.close();

 // Construct the GUI components 
 and pass a reference to the XML root node
 javax.swing.JFrame.
 setDefaultLookAndFeelDecorated(true);
			
 JTreeTester = new JTreeTester( "JTree+XML Tester", xmlText.toString() );
    }
 else
   {
  help();
} //end if ( fileName.substring
( fileName.indexOf( '.' ) ).equals( ".xml" ) )
 }
 else
{
 help();
 } //end if( args.length > 0 )
 }
 catch( FileNotFoundException fnfEx )
 {
 System.out.println( fileName + " was not found." );

 exit();
 }
catch( Exception ex )
 {
 ex.printStackTrace();
   exit();
  }// end try/catch
}// end main()

// A common source of operating instructions
 private static void help()
{
System.out.println( "\nUsage: java JTreeTester filename.xml" );
System.exit(0);
} //end help()

 // A common point of exit
 private static void exit()
 {
 System.out.println( "\nThank you for using the JTreeTester" );
  System.exit(0);
  } //end exit()
} //end JTreeTester

Compiling the examples:

javac XMLTree.java JTreeTester.java

Running the example:

java JTreeTester eBIZ_com_staff.xml

Output:

Schema Validation

Online Validation

Online validation is one of the easiest things that can be used to validate an XML file against its Schema. These tools are free and are very easy to use and you don’t need to write even a single command. There are a lot of online validators available and they really high quality and the output very reliable.

Here I am going to help you, with some of them

1. http://tools.decisionsoft.com/schemaValidate/ Open the validation home page and select the Schema and XML file and click on Validate button

2. W3 online schema validation

http://www.w3.org/2001/03/webdata/xsv

Select the Schema that you want to validate and click on Upload and Get Results button, to view the validation result.

Schema Validation

Validating Schema with Java

Given below is the source code of a java program, which validates an XML file against the specified Schema.

This program accepts the XML and Schema document as command line argument and uses Apache Xerces SAXParser to parse and validate the files and get the result.

import org.apache.xerces.parsers.SAXParser;
import java.io.IOException;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.helpers.DefaultHandler;
public class MySchemaValidator{
 public void validateSchema
 (String SchemaUrl, String XmlDocumentUrl)   {
   SAXParser parser = new SAXParser();     
     try{
parser.setFeature("http://
xml.org/sax/features/validation",true);
parser.setFeature("http://
apache.org/xml/features/
validation/schema",true);
parser.setFeature("http://
apache.org/xml/features/
validation/schema-full-checking",  true);
 parser.setProperty("http://
apache.org/xml/properties/schema/
external-noNamespaceSchemaLocation",   SchemaUrl );
  Validator handler=new Validator();
 parser.setErrorHandler(handler);
   parser.parse(XmlDocumentUrl);
  if(handler.validationError==true)           
System.out.println("XML Document has Error:"+handler.validationError+""
+handler.saxParseException.getMessage());
else                   
    System.out.println("XML Document is valid");            }
    catch(java.io.IOException ioe){   
     System.out.println("IOException"+ioe.getMessage());    
   }catch (SAXException e) { 
     System.out.println("SAXException"+e.getMessage());    
  }     
}


private class Validator extends DefaultHandler {
public boolean  validationError = false;  
public SAXParseException saxParseException=null; 
public void error(SAXParseException exception) throws SAXException	       {
    validationError=true;
    saxParseException=exception;
    }     
public void fatalError(SAXParseException
 exception) throws SAXException
  {                        validationError = true;	    
    saxParseException=exception;	     
    }		    
 public void warning(SAXParseException
  exception) throws SAXException	       {}	
  }   

public static void main(String[] argv)    { 
   String SchemaUrl=argv[0]; 
   String XmlDocumentUrl=argv[1];    
 MySchemaValidator validator=new MySchemaValidator(); 
 validator.validateSchema(SchemaUrl, XmlDocumentUrl);      }  

 }

Compile the example:

javac MySchemaValidator.java

Running the example:

java MySchemaValidator file.xml file.xsd.

Note:

This example uses Apache Xerces API, download the latest version on Xerces from

http://www.apache.org/dist/xml/xerces-j/, extract the jar files from the archive and add the corresponding jar files to CLASSPATH.

PC TRAINING

XML Schema - PART III

0 comments:

CATEGORIES

POPULAR POSTS

BLOG ARCHIVE

ABOUT ME

Categories

About

Link list 2

Follow us

Link list 3