Saturday, December 1, 2012

Effective way to parse very large XML without using excessive memory allocation

The common scenario is to use JAXB memory marshalling. It is quick and easy way to marshal and unmarshal small size XML. However the same JAXB code can consume tons of memory for large XML files. There are few different approaches to marshal and unmarshal very large XML files without using excessive memory allocation. 

JAXB with File streams –The JAXB with File streams can provide the memory effective solution for marshalling and unmarshalling xml. Here is some sample code that explains how to use File streams.

        List<Object> data = new ArrayList<Object>(); //data with large list
        JAXBContext jc = JAXBContext.newInstance("XML SCHEMA PATH");
        Marshaller m = jc.createMarshaller();
        OutputStream outputStream = new                                    FileOutputStream(new File("largetest.xml"));
        m.marshal(data, outputStream);       
        Unmarshaller um = jc.createUnmarshaller();
        InputStream inputStream = new FileInputStream(new File("largetest.xml"));
        data = (List<Object>)um.unmarshal(inputStream);

JAXB with STax parser –The STax parser provides a capabilities to pull XML elements instead of bringing whole XML file in a memory.  JAXB with STax parser can provide another alternative solution where we have tight memory requirements. Here is some pseudo code that explains how to use STax technology.

        final XMLInputFactory xmlif = XMLInputFactory.newInstance();
        xmlif.setProperty(XMLInputFactory.IS_COALESCING, true);
        XMLStreamReader xmlr = null;      
        JAXBContext objectHeaderTypeCtx = JAXBContext.newInstance("Name of your unmarshal class");
        StringReader input;
        xmlr = xmlif.createXMLStreamReader(input);
        while (xmlr.getEventType() != XMLStreamConstants.END_DOCUMENT) {
            if (xmlr.isStartElement() &&                               "ObjectHeader".equals(xmlr.getLocalName())) {
                Unmarshaller um = objectHeaderTypeCtx.createUnmarshaller();
                JAXBElement<ObjectHeaderType> header = um.unmarshal(xmlr,ObjectHeaderType.class);               

No comments:

Post a Comment