Doug Tidwell
Cyber Evangelist, developerWorks XML Team
September 1999
About this tutorial
Our first tutorial, “
” discussed the basics of XML and demonstrated its potential to
revolutionize the Web. This tutorial shows you how to use an XML parser and other tools to create,
process, and manipulate XML documents. Best of all, every tool discussed here is freely available at
(
and other places on the Web.
About the author
Doug Tidwell is a Senior Programmer at IBM. He has well over a seventh of a century of programming
experience and has been working with XML-like applications for several years. His job as a Cyber
Evangelist is basically to look busy, and to help customers evaluate and implement XML technology.
Using a specially designed pair of zircon-encrusted tweezers, he holds a Masters Degree in Computer
Science from Vanderbilt University and a Bachelors Degree in English from the University of Georgia.
1
Section 1 – Introduction
Tutorial – XML Programming in Java
Section 1 – Introduction
About this tutorial
discussed the basics of XML
and demonstrated its potential to revolutionize the
Web. In this tutorial, we’ll discuss how to use an
XML parser to:
•
Process an XML document
•
Create an XML document
•
Manipulate an XML document
We’ll also talk about some useful, lesser-known
features of XML parsers. Best of all, every tool
discussed here is freely available at
(
and
other places on the Web.
What’s not here
There are several important programming topics
not
discussed here:
•
Using visual tools to build XML applications
•
Transforming an XML document from one
vocabulary to another
•
Creating interfaces for end users or other
processes, and creating interfaces to back-end
data stores
All of these topics are important when you’re
building an XML application. We’re working on
new tutorials that will give these subjects their due,
so watch this space!
XML application architecture
User
Interface
XML
Application
Data
Store
An XML application is typically built around an XML
parser. It has an interface to its users, and an
interface to some sort of back-end data store.
XML Parser
(Original artwork drawn by Doug Tidwell. All rights reserved.)
This tutorial focuses on writing Java code that uses
an XML parser to manipulate XML documents. In
the beautiful picture on the left, this tutorial is
focused on the middle box.
2
Tutorial – XML Programming in Java
Section 2 – Parser basics
Section 2 – Parser basics
The basics
An XML parser is a piece of code that reads a
document and analyzes its structure. In this
section, we’ll discuss how to use an XML parser to
read an XML document. We’ll also discuss the
different types of parsers and when you might want
to use them.
Later sections of the tutorial will discuss what you’ll
get back from the parser and how to use those
results.
How to use a parser
We’ll talk about this in more detail in the following
sections, but in general, here’s how you use a
parser:
1. Create a parser object
2. Pass your XML document to the parser
3. Process the results
Building an XML application is obviously more
involved than this, but this is the typical flow of an
XML application.
Kinds of parsers
There are several different ways to categorize
parsers:
•
Validating versus non-validating parsers
•
Parsers that support the Document Object
Model (DOM)
•
Parsers that support the Simple API for XML
(SAX)
•
Parsers written in a particular language (Java,
C++, Perl, etc.)
3
Section 2 – Parser basics
Tutorial – XML Programming in Java
Validating versus non-validating parsers
As we mentioned in our first tutorial, XML
documents that use a DTD and follow the rules
defined in that DTD are called
valid documents
.
XML documents that follow the basic tagging rules
are called
well-formed documents
.
The XML specification requires all parsers to report
errors when they find that a document is not well-
formed. Validation, however, is a different issue.
Validating parsers
validate XML documents as they
parse them.
Non-validating parsers
ignore any
validation errors. In other words, if an XML
document is well-formed, a non-validating parser
doesn’t care if the document follows the rules
specified in its DTD (if any).
Why use a non-validating parser?
Speed and efficiency. It takes a significant amount
of effort for an XML parser to process a DTD and
make sure that every element in an XML document
follows the rules of the DTD. If you’re sure that an
XML document is valid (maybe it was generated by
a trusted source), there’s no point in validating it
again.
Also, there may be times when all you care about is
finding the XML tags in a document. Once you
have the tags, you can extract the data from them
and process it in some way. If that’s all you need
to do, a non-validating parser is the right choice.
The Document Object Model (DOM)
The Document Object Model is an official
recommendation of the World Wide Web
Consortium (W3C). It defines an interface that
enables programs to access and update the style,
structure, and contents of XML documents. XML
parsers that support the DOM implement that
interface.
The first version of the specification, DOM Level 1,
is available at
if you enjoy reading that kind of thing.
4
Tutorial – XML Programming in Java
Section 2 – Parser basics
What you get from a DOM parser
When you parse an XML document with a DOM
parser, you get back a tree structure that contains
all of the elements of your document. The DOM
provides a variety of functions you can use to
examine the contents and structure of the
document.
A word about standards
Now that we’re getting into developing XML
applications, we might as well mention the XML
specification. Officially, XML is a trademark of MIT
and a product of the World Wide Web Consortium
(W3C).
an official recommendation
of the W3C, is available at
for your reading pleasure. The W3C site
contains specifications for XML, DOM, and literally
dozens of other XML-related standards. The XML
zone at developerWorks has
The Simple API for XML (SAX)
The SAX API is an alternate way of working with
the contents of XML documents. A
de facto
standard, it was developed by David Megginson
and other members of the XML-Dev mailing list.
To see the complete SAX standard, check out
To subscribe to the
XML-Dev mailing list, send a message to
containing the following:
subscribe xml-dev
.
5