$Revision: 1.12 $

Overview

Abstract

This is a tutorial for using REXML, a pure-Ruby XML processor.

Introduction

REXML was inspired by the Electric XML library for Java, which features an easy-to-use API, small size, and speed. Hopefully, REXML, designed with the same philosophy, has these same features. I've tried to keep the API as intuitive as possible, and have followed the Ruby methodology for method naming and code flow, rather than mirroring the Java API.

REXML supports both tree and stream document parsing. Stream parsing is extremely fast (about 1.5 thousand times as fast). However, with stream parsing, you don't get access to features such as XPath.

Tree Parsing XML and accessing Elements

We'll start with parsing an XML document

require "rexml/document"
file = File.new( "mydoc.xml" )
doc = REXML::Document.new file

Line 3 creates a new document and parses the supplied file. You can also do the following

require "rexml/document"
include REXML	# so that we don't have to prefix everything with REXML::...
string = <<EOF
	<mydoc>
		<someelement attribute="nanoo">Text, text, text</someelement>
	</mydoc>
EOF
doc = Document.new string

So parsing a string is just as easy as parsing a file. For future examples, I'm going to omit both the require and include lines.

Once you have a document, you can access elements in that document in a number of ways:

Here are a few examples using these methods. First is the source document used in the examples:

The source document
<inventory title="OmniCorp Store #45x10^3">
   <section name="health">
      <item upc="123456789" stock="12">
         <name>Invisibility Cream</name>
         <price>14.50</price>
         <description>Makes you invisible</description>
         </item>
      <item upc="445322344" stock="18">
         <name>Levitation Salve</name>
         <price>23.99</price>   
         <description>Levitate yourself for up to 3 hours per application</description>
      </item>
   </section>
   <section name="food">
      <item upc="485672034" stock="653">
         <name>Blork and Freen Instameal</name>
         <price>4.95</price>
         <description>A tasty meal in a tablet; just add water</description>
      </item>
      <item upc="132957764" stock="44">
        <name>Grob winglets</name>
        <price>3.56</price>
        <description>Tender winglets of Grob.  Just add water</description>
      </item>
   </section>
</inventory>
Accessing Elements
doc = Document.new File.new("mydoc.xml")
doc.elements.each("inventory/section") { |element| puts element.attributes["name"] }
# -> health
# -> food
doc.elements.each("*/item") { |element| puts element.attributes["upc"] }
# -> 123456789
# -> 445322344
# -> 485672034
# -> 132957764
root = doc.root
puts root.attributes["title"]
# -> OmniCorp Store #45x10^3
puts root.elements["section/item[@stock='44']"].attributes["upc"]
# -> 132957764
puts root.elements["section"].attributes["name"]
# -> health    (returns the first encountered matching element)
puts root.elements[1].attributes["name"]
# -> food      (returns the FIRST child element)
root.detect {|node|
   node.kind_of? Element and
   node.attributes["name"] == "food"
}

The last line finds the first child element with the name of "food". As you can see in this example, accessing attributes is also straightforward.

Creating XML documents

Again, there are a couple of mechanisms for creating XML documents in REXML. Adding elements by hand is faster than the convenience method, but which you use will probably be a matter of aesthetics.

Creating elements
el = someelement.add_element "myel"
# creates an element named "myel", adds it to "someelement", and returns it
el2 = el.add_element "another", {"id"=>"10"}
# does the same, but also sets attribute "id" of el2 to "10"
el3 = Element.new "blah"
el1.elements << el3
el3.attributes["myid"] = "sean"
# creates el3 "blah", adds it to el1, then sets attribute "myid" to "sean"

If you want to add text to an element, you can do it by either creating Text objects and adding them to the element, or by using the convenience method text=

Adding text
el1 = Element.new "myelement"
el1.text = "Hello world!"
# -> <myelement>Hello world!</myelement>
el1.add_text "Hello dolly"
# -> <myelement>Hello world!Hello dolly</element>
el1.add Text.new("Goodbye")
# -> <myelement>Hello world!Hello dollyGoodbye</element>
el1 << Text.new(" cruel world")
# -> <myelement>Hello world!Hello dollyGoodbye cruel world</element>

But note that each of these text objects are still stored as separate objects; el1.text will return "Hello world!"; el1[2] will return a Text object with the contents "Goodbye".

If you want to insert an element between two elements, you can use either the standard Ruby array notation, or Parent.insert_before and Parent.insert_after.

Inserts
doc = Document.new "<a><one/><three/></a>"
doc.root[1,0] = Element.new "two"
# -> <a><one/><two/><three/></a>
three = doc.elements["a/three"]
doc.root.insert_after three, Element.new "four"
# -> <a><one/><two/><three/><four/></a>

Writing a tree

There isn't much simpler than writing a REXML tree. Simply pass an object that supports <<( String ) to the write method of any object. In Ruby, both IO instances (File) and String instances support <<.

doc.write $stdout
output = ""
doc.write output

By default, REXML formats the output with indentation. If you want REXML to not format the output, pass write() and indent of -1:

Write with no indent
doc.write $stdout, -1

Iterating

There are four main methods of iterating over children. Element.each, which iterates over all the children; Element.elements.each, which iterates over just the child Elements; Element.next_element and Element.previous_element, which can be used to fetch the next Element siblings; and Element.next_sibling and Eleemnt.previous_sibling, which fetches the next and previous siblings, regardless of type.

Stream Parsing

REXML stream parsing requires you to supply a Listener class. When REXML encounters events in a document (tag start, text, etc.) it notifies your listener class of the event. You can supply any subset of the methods, but make sure you implement method_missing if you don't implement them all. A StreamListener module has been supplied as a template for you to use.

Stream parsing
list = MyListener.new
source = File.new "mydoc.xml"
REXML::Document.parse_stream source

Please look at the StreamListener API for more information.