Monday, 9 December 2013

SAX / DOM / STAX




here is a small comparison between SAX, DOM and STAX

╔══════════════════════════════════════╦═════════════════════════╦═════════════════════════╦═══════════════════════╦═══════════════════════════╗
║          JAXP API Property           ║          StAX           ║           SAX           ║          DOM          ║           TrAX            ║
╠══════════════════════════════════════╬═════════════════════════╬═════════════════════════╬═══════════════════════╬═══════════════════════════╣
║ API Style                            ║ Pull events; streaming  ║ Push events; streaming  ║ In memory tree based  ║ XSLT Rule based templates ║
║ Ease of Use                          ║ High                    ║ Medium                  ║ High                  ║ Medium                    ║
║ XPath Capability                     ║ No                      ║ No                      ║ Yes                   ║ Yes                       ║
║ CPU and Memory Utilization           ║ Good                    ║ Good                    ║ Depends               ║ Depends                   ║
║ Forward Only                         ║ Yes                     ║ Yes                     ║ No                    ║ No                        ║
║ Reading                              ║ Yes                     ║ Yes                     ║ Yes                   ║ Yes                       ║
║ Writing                              ║ Yes                     ║ No                      ║ Yes                   ║ Yes                       ║
║ Create, Read, Update, Delete (CRUD)  ║ No                      ║ No                      ║ Yes                   ║ No                        ║
╚══════════════════════════════════════╩═════════════════════════╩═════════════════════════╩═══════════════════════╩═══════════════════════════╝

there are different approaches for parsing an xml source. You should select proper approach for your needs. You may choose one of these:
  • DOM - Document Object Model,
  • SAX - Simple API for XML,
  • StAX – Streaming API for XML
Let’s discuss each one.



Parsing with DOM:

If you prefer this technique you should know that the whole XML will be loaded into memory. Advantage of this technique is you can navigate/read to any node. You can append, delete or update a child node because data is available in the memory. However if the XML contains a large data, then it will be very expensive to load it into memory. Also the whole XML is loaded to memory although you are looking for something particular.

You should consider using this technique, when you need to alter xml structure and you are sure that memory consumption is not going to be expensive. Also this is the only choice where you can navigate to parent and child elements. This makes it easier to use.

If you are creating a XML document (which is not big!) you should use this technique. However, if you are going to export a data from a database to xml (where you do not need navigation in the xml and/or data is huge) then you should consider other approaches.

DOM API is standardized by w3c. 

Parsing with SAX:

SAX has totally a different approach. It starts to read the XML document from beginning to end, but it does not store anything to memory. Instead it fires events and you can add your event handler depending on your requirements.

Your event handler will be called for example when an element begins or ends, when processing of document begins or ends. For all events please follow this link
So you register a handler (or more then one handler) and those handlers are called when an event occurs. 
Here is a sample code from a site which calculates the total amount from this xml.

123456789101112131415161718192021222324252627282930313233343536
import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import org.apache.xerces.parsers.SAXParser;
public class Flour extends DefaultHandler {
float amount = 0;
public void startElement(String namespaceURI, String localName,
String qName, Attributes atts) {
if (namespaceURI.equals("http://recipes.org")
&& localName.equals("ingredient")) {
String n = atts.getValue("", "name");
if (n.equals("flour")) {
String a = atts.getValue("", "amount"); // assume 'amount' exists
amount = amount + Float.valueOf(a).floatValue();
}
}
}
public static void main(String[] args) {
Flour f = new Flour();
SAXParser p = new SAXParser();
p.setContentHandler(f);
try {
p.parse(args[0]);
} catch (Exception e) {
e.printStackTrace();
}
System.out.println(f.amount);
}
}


With SAX, first of all you do not need to worry on memory consumptions. If the performance is the criteria, (and if you are only reading the xml, not modifying it), SAX is a much better choice then DOM. However you are not going to have a tree structure where you can require parent or child elements. You should be aware where you are.

Parsing with StAX:
StAX is a newer technology then the others we discussed and it is the only one with a JSR (JSR-173).

Parsing with StAX look like parsing with SAX. Again StAX does not store anything to memory and the document is read from beginning to end once.

However in SAX, your event handler is called by SAX when an event occurs. In StAX, you ask StAX to continue to next event.

You can use StAX in two methods, the “cursor model” and the “iterator model”. 

Here is a simple code fragment I found on google. “cursor model” looks like:

123456789101112131415161718192021222324252627
URL u = new URL("http://www.cafeconleche.org/");
InputStream in = u.openStream();
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser = factory.createXMLStreamReader(in);
while (true) {
int event = parser.next();
if (event == XMLStreamConstants.END_DOCUMENT) {
parser.close();
break;
}
if (event == XMLStreamConstants.START_ELEMENT) {
System.out.println(parser.getLocalName());
}
}

As you see above, next event is required by us (parser.next();). In “iterator model” the logic is same but you receive an object while iterating which contains information about the current event like:

12345678910111213141516171819
XMLEventReader eventReader = XMLInputFactory.newInstance().createXMLEventReader(
new FileInputStream("abc.xml"));
while(eventReader.hasNext()) {
XMLEvent event = eventReader.next();
if (event instanceof StartElement)
{
System.out.println( ((Characters)eventReader.next())
.getData());
}
}


They were technologies, we also have implementaions.

After choosing your technology you can choose an implementaiton. There are different DOM,SAX and StAX implementations.

No comments:

Post a Comment