What is New in 2.11

Version 2.11, simultaneously available  in C, Java, C++, and C#, is the latest release of VTD-XML. So what is new? The shortly answer: (1) It is more standards-compliant by conforming strictly to XPath 1.0 spec’s notion of node(). (2) It  introduces major performance improvement for XPath expressions involving simple position index.(3)This release introduces major performance improvement for XPath expression containing complex predicates involving  absolute location path expressions. (4) It also contains various bug releases as reported by VTD-XML users.

Change to Node() Interpretation

Before 2.11, node() in a location step in an XPath expression will be interpreted as equivalent to *, i.e., an element node with any name. With 2.11 the same node() will be interpreted either one of “element(), text(), comment(), or processing-instruction(), as defined by XPath 1.0 spec.

Performance Improvement for Simple Position Index

A quick example is “a[2]/b[1].”  A simple position index is basically a constant index value in predicate. 2.11’s XPath engine is now smart enough to detect this use case and allow for early escaping from the execution loop, resulting in faster execution performance. The amount of improvement depends on how frequent the simple index is used in each location step. In some cases, a 50% to 70% execution speedup is possible.

Performance Improvement for Predicates Containing Absolute Path Expressions

A quick example is //a[//abc/@val='1']. Notice that predicate contains //abc, which is an absolute path expression. Before 2.11, this expression will trigger repetitive evaluation of //abc to determine whether the predicate is true or false.  The processing cost would increase rapidly with respect to the size of the document. This release would intelligently cache the evaluation result so the corresponding XPath  is evaluated only once. Please notice that this feature is enabled by default, if you can turn it off (we don’t recommend it)  by invoking AutoPilot’s enableCaching’s method and give it a “false.”

How much of an improvement can you expect to see? Depending on size of documents, complexity of predicates and other things. Sometime you will be achieve astonishing results. Consider the following expression.

//CDResults[../../../TargetName/@Value=”//SiteInformation[“TargetName/@Value!=//SiteInformation[1]/TargetName/@Value and TargetName/@Value!=//SiteInformation[TargetName/@Value!=//SiteInformation[1]/TargetName/@Value”][1]/TargetName/@Value][1]/TargetName/@Value]/BottomCD/@Value

Running this document on a 22MB xml document  in Java would take many hours in virtually all XPath implementation including 2.10 version of VTD-XML. With 2.11, it took less than 5 seconds on a commodity, 3 year old PC.

Bug fixes

There are other bug fixes, covering XMLModifier’s deletion capabilities and permissiveness of deletion of sub-nodes.

VTD-XML 2.11 Released

VTD-XML is now released, i will add a blog talking about the major features/improvements in this release soon… The release can be downloaded from

http://sourceforge.net/projects/vtd-xml/files/vtd-xml/ximpleware_2.11/

VTD-XML 2.10 Released

VTD-XML 2.10 is now released under Java, C#, C and C++. It can be downloaded at
https://sourceforge.net/projects/vtd-xml/files/vtd-xml/ximpleware_2.10/. This release includes a number of new features and enhancement.

  • The core API of VTD-XML has been expanded. Users can now perform cut/paste/insert on an empty element.
  • This release also adds the support of deeper location cache support for parsing and indexing. This feature is useful for application performance  tuning for processing various XML documents.
  • The java version also added support for processing zip and gzip files. Direct processing of httpURL based XML is enhanced.
  • Extended Java version now support Iso-8859-10~16 encoding.
  • A full featured C++ port is released.
  • C version of VTD-XML now make use of thread local storage to address the  thread-safety issue for multi-threaded application.

There are also a number of bugs fixed. Special thanks to Jozef Aerts, John Sillers, Chris Tornau and a number of other users for input and suggestions

Thread Safety in C Version of VTD-XML

Before 2.10, the C version of vtd-xml makes extensive use of global variables for XPath query compilation. The thread safety problem arises when multple instances of an application  perform  XPath compilation at the same time. To resolve this issue, VTD-XML 2.10 replaces all global variables for XPath compilation with thread local vriables: instead of simply declaring a variable, prepend _thread to the declaration. The thread local variable is just like global variable, except it is specific/visible within a thread. The macro for “_thread” is defined in “customTypes.h.”

How does the use of thread local variable impact the overall design of your application? Fortunately, very little change is required. The most significant one is the global thread context declaration: The old one looks something like this:

struct exception_context the_exception_context[1];
int main(){
          exception e;
          Try {  // put the code throwing exceptions here
          } Catch (e) {  // handle exception in here
          }
}

From 2.10 and onward the app will look like below

_thread struct exception_context the_exception_context[1];
int main(){
          exception e;
          Try {  // put the code throwing exceptions here
          } Catch (e) {  // handle exception in here
          }
}

Location Cache Depth Tuning

Before version 2.10, the location cache depth is set to 3. In this version, you can choose either 3 or 5, by simply calling VTDGen’s setLcDepth() (see the example below). The benefit is that at the cost of negligible parsing and memory overhead, the random access performance of VTDNav improves, especially for depth XML documents.


   VTDGen vg = new VTDGen();

   vg.selectLcDepth(5);

Insert Text into Empty Element

In this release,  you now have the ability to insert text into an empty element  (e.g. <a/>).

  • Insert “some text” into <a/>, you get “<a>some text</a>”.
  • Insert <b/> into <a/>, you get <a><b/></a>. 

VTD-XML in  C++

Below is a simple app written in VTD-XML and C++.

#include "everything.h"
//#include "bookMark.h"

using namespace com_ximpleware;

int main(){
 FILE *f = NULL;
 FILE *fo = NULL;
 int i = 0;

 Long l = 0;
 int len = 0;
 int offset = 0;

 char* filename = "c:/xml/soap2.xml";
 struct stat s;
 UByte *xml = NULL; // this is the buffer containing the XML content, UByte means unsigned byte
 //VTDGen *vg = NULL; // This is the VTDGen that parses XML
 VTDNav *vn = NULL; // This is the VTDNav that navigates the VTD records
 AutoPilot *ap = NULL;
 char *sm = "\n================\n";

 // allocate a piece of buffer then reads in the document content
 // assume "c:\soap2.xml" is the name of the file
 f = fopen(filename,"r");
 fo = fopen("c:/xml/out.txt","w");

 stat(filename,&s);

 i = (int) s.st_size; 
 printf("size of the file is %d \n",i);
 xml = new UByte[i];
 fread(xml,sizeof(UByte),i,f);
 VTDGen vg;
 try{
  
  vg.setDoc(xml,i);
  vg.parse(true);
  vn = vg.getNav();
  AutoPilot ap;
  ap.declareXPathNameSpace(L"ns1",L"<a href="http://www.w3.org/2003/05/soap-envelope">http://www.w3.org/2003/05/soap-envelope</a>");
  //if (ap.selectXPath(L"/ns1:Envelope/ns1:Header/*[@ns1:mustUnderstand]")){
  if (ap.selectXPath(L"/ns1:Envelope/ns1:Header/*[@ns1:mustUnderstand]")){
  //if (ap.selectXPath(L"/a/b/*")){
   ap.printExprString();
   ap.bind(vn);
   int i=-1;
   while((i=ap.evalXPath())!= -1){
    //printf("\n hi ==> %d \n",i);
    l = vn->getElementFragment();
    offset = (int) l;
    len = (int) (l>>32);
    fwrite((char *)(xml+offset),sizeof(UByte),len,fo);
    fwrite((char *) sm,sizeof(UByte),strlen((char*)sm),fo);
   }
  }
  fclose(f);
  fclose(fo);
  // remember C has no automatic garbage collector
  // needs to deallocate manually.
  delete(vn);
 }
 catch (ParseException &e){
  //vg.printLineNumber();
  printf(" error ===> %s \n",e.getMessage());
 }
 catch (...) {
  delete (vn);
 }
 return 0;
} 

How to Read All Attributes of an Element in VTD-XML?

There are two ways to read all the attribute values of an element node.

The first one is to use XPath expression  @* as in the example below
  ap = new AutoPilot (vn);
  ap.selectXPath(“@*”);
  int i=-1;
while((i=ap.evalXPath())!=-1){
      // i will be attr name, i+1 will be attribute value
   } 

  

The second is lighter weight, which is by directly using autoPilot’s selectAttr() and iterAttr()
ap = new AutoPilot(vn);
ap.selectAttr(“*”);
int i=-1;
while((i=ap.iterateAttr())!=-1){
 // i will be attr name, i+1 will be attribute value
}

60x? That sounds just right.

I came across a recent blog in which the author benchmarks the performance of evaluating XPath using VTD-XML on a 20 MB and comparing it to JAXP. The result is a convincing 60X. Surprised? Don’t be. The fact is that DOM and JAXP just have too much inherent issues (performance, memory usage etc). Below is the link to that blog

http://fahdshariff.blogspot.com/2010/08/faster-xpaths-with-vtd-xml.html

VTD-XML 2.9 Released

VTD-XML 2.9, the next generation XML Processing API for SOA and Cloud computing, has been released. Please visit  https://sourceforge.net/projects/vtd-xml/files/ to download the latest version.
  • Strict Conformance
    • VTD-XML now fully conforms to XML namespace 1.0 spec
  • Performance Improvement
    • Significantly improved parsing performance for small XML files
  • Expand Core VTD-XML API 
    • Adds getPrefixString(), and toNormalizedString2()
  • Cutting/Splitting
    • Adds getSiblingElementFragment() 
  • A number of bug fixes and code enhancement including:
    • Fixes a bug for reading very large XML documents on some platforms
    • Fixes a bug in parsing processing instruction
    • Fixes a bug in outputAndReparse()

Shuffle XML Elements with XPath and VTD-XML

This is a simple app that shuffles elements in an XML file. It uses XPath to address individual element then re-arrange and re-combine the fragments.  Those fragments are identified by their offsets and lengths, both of which are obtained by calling VTDNav’s getElementFragment().

Why not using SAX and STaX?

Simply speaking, the lack of XPath support makes it very tedious, almost impossible, to re-arrange XML element fragments.

Why not using DOM?

Aside from performance and memory usage, the redundant, wasteful de-serialization/serialization contributes nothing but overhead to the task.

The Code

Input XML:

   <root>
     <a> text </a>
     <b> text </b>
     <c> text </c>
     <a> text </a>
     <b> text </b>
     <c> text </c>
     <a> text </a>
     <b> text </b>
     <c> text </c>
   </root>

Output.xml

   <root>
     <a> text </a>
     <a> text </a>
     <a> text </a>
     <b> text </b>
     <b> text </b>
     <b> text </b>
     <c> text </c>
     <c> text </c>
     <c> text </c>
   </root>

Java Code:

import com.ximpleware.*;
import java.io.*;
public class shuffle {
    public static void main(String[] args) throws Exception {
        VTDGen vg = new VTDGen();
        AutoPilot ap0 = new AutoPilot();
        AutoPilot ap1 = new AutoPilot();
        AutoPilot ap2 = new AutoPilot();
        ap0.selectXPath("/root/a");
        ap1.selectXPath("/root/b");
        ap2.selectXPath("/root/c");

        if (vg.parseFile("old.xml",false)){
            VTDNav vn = vg.getNav();
            ap0.bind(vn);
            ap1.bind(vn);
            ap2.bind(vn);
            FileOutputStream fos = new FileOutputStream("new.xml");
            fos.write("<root>".getBytes());
            byte[] ba = vn.getXML().getBytes();
            while(ap0.evalXPath()!=-1){
                long l= vn.getElementFragment();
                int offset = (int)l;
                int len = (int)(l>>32);
                fos.write('\n');
                fos.write(ba,offset, len);
            }
            ap0.resetXPath();
            while(ap1.evalXPath()!=-1){
                long l= vn.getElementFragment();
                int offset = (int)l;
                int len = (int)(l>>32);
                fos.write('\n');
                fos.write(ba,offset, len);
            }
            ap1.resetXPath();
            while(ap2.evalXPath()!=-1){
                long l= vn.getElementFragment();
                int offset = (int)l;
                int len = (int)(l>>32);
                fos.write('\n');
                fos.write(ba,offset, len);
            }
            ap2.resetXPath();
            fos.write('\n');
            fos.write("</root>".getBytes());
        }
    }
}

C# code:

using System;
using com.ximpleware;

namespace shuffle
{
    public class shuffle
    {
        public static void Main(String[] args)
        {
            VTDGen vg = new VTDGen();
            AutoPilot ap0 = new AutoPilot();
            AutoPilot ap1 = new AutoPilot();
            AutoPilot ap2 = new AutoPilot();
            ap0.selectXPath("/root/a");
            ap1.selectXPath("/root/b");
            ap2.selectXPath("/root/c");
            Encoding eg = System.Text.Encoding.GetEncoding("utf-8");
            if (vg.parseFile("old.xml", false))
            {
                VTDNav vn = vg.getNav();
                ap0.bind(vn);
                ap1.bind(vn);
                ap2.bind(vn);
                FileStream fos = new FileStream("new.xml", System.IO.FileMode.OpenOrCreate);
                //fos.Write("<root>".getBytes());
                byte[] ba0,ba1, ba2, ba3, ba4;
                //ba0 = eg.GetBytes("
                ba1 = eg.GetBytes("<root>");
                ba2 = eg.GetBytes("</root>");
                ba3 = eg.GetBytes("\n");
                fos.Write(ba1, 0, ba1.Length);
                byte[] ba = vn.getXML().getBytes();
                while (ap0.evalXPath() != -1)
                {
                    long l = vn.getElementFragment();
                    int offset = (int)l;
                    int len = (int)(l >> 32);
                    fos.Write(ba3,0,ba3.Length);
                    fos.Write(ba, offset, len);
                }
                ap0.resetXPath();
                while (ap1.evalXPath() != -1)
                {
                    long l = vn.getElementFragment();
                    int offset = (int)l;
                    int len = (int)(l >> 32);
                    fos.Write(ba3,0,ba3.Length);
                    fos.Write(ba, offset, len);
                }
                ap1.resetXPath();
                while (ap2.evalXPath() != -1)
                {
                    long l = vn.getElementFragment();
                    int offset = (int)l;
                    int len = (int)(l >> 32);
                    fos.Write(ba3,0,ba3.Length);
                    fos.Write(ba, offset, len);
                }
                ap2.resetXPath();
                fos.Write(ba3,0,ba3.Length);
                fos.Write(ba2,0,ba2.Length);
            }
        }
    }
}

Process Huge XML Documents with Extended VTD-XML

If you have XML files that are larger than 2GB, and don’t want to lose the benefit of XPath (full set), you will be surprised on how handy and fast extended VTD-XML can become.

New since version 2.3 and a part of full VTD-XML distribution , extended VTD-XML expands the maximum document size to 256 GB and requires 64-bit JVM to achieve those limits.

Extended VTD-XML works with XML either with standard in-memory mode (like the standard VTD-XML), or the memory mapped mode, which allows partial document loading.

The code examples below shows you how to use extended VTD-XML to process XML document using in-memory mode and memory mapped mode:

Memory Mapped Mode

import com.ximpleware.extended.*;
public class mem_mapped_read {
	// first read is the longer version of loading the XML file 
	public static void first_read() throws Exception{
	XMLMemMappedBuffer xb = new XMLMemMappedBuffer();
        VTDGenHuge vg = new VTDGenHuge();
        xb.readFile("test.xml");
        vg.setDoc(xb);
        vg.parse(true);
        VTDNavHuge vn = vg.getNav();
        System.out.println("text data ===>" + vn.toString(vn.getText()));
	}	

	// second read is the shorter version of loading the XML file 
	public static void second_read() throws Exception{
	    VTDGenHuge vg = new VTDGenHuge();
	    if (vg.parseFile("test.xml",true,VTDGenHuge.MEM_MAPPED)){
	        VTDNavHuge vn = vg.getNav();
	        System.out.println("text data ===>" + vn.toString(vn.getText()));
	    }
	}

	public static void main(String[] s) throws Exception{
		first_read();
	 	second_read();
	}
}

In Memory Mode

/**
 * This is a demonstration of how to use the extended VTD parser
 * to process large XML file. 
 *
 */
import com.ximpleware.extended.*;
public class in_mem_read {
	// first read is the longer version of loading the XML file 
	public static void first_read() throws Exception{
		XMLBuffer xb = new XMLBuffer();
        VTDGenHuge vg = new VTDGenHuge();
        xb.readFile("test.xml");
        vg.setDoc(xb);
        vg.parse(true);
        VTDNavHuge vn = vg.getNav();
        System.out.println("text data ===>" + vn.toString(vn.getText()));
	}	

	// second read is the shorter version of loading the XML file 
	public static void second_read() throws Exception{
	    VTDGenHuge vg = new VTDGenHuge();
	    if (vg.parseFile("test.xml",true,VTDGenHuge.IN_MEMORY)){
	        VTDNavHuge vn = vg.getNav();
	        System.out.println("text data ===>" + vn.toString(vn.getText()));
	    }
	}

	public static void main(String[] s) throws Exception{
		first_read();
	 	second_read();
	}
}

Namespace Compensation with VTD-XML

When you extract an element fragment by calling VTDNav’s getElementFragment(), you won’t be able to carry along the name spaces within context. If you want to extract an element fragment along with its name spaces, you need to call getElementFragmentNs(). Below are the examples of calling

The Java Version:

// Insert a ns-compensated fragment into an XML doc

public class FragmentTest {    
    public static void main(String[] s) throws Exception{
        // instantiate VTDGen and XMLModifier
        VTDGen vg = new VTDGen();
        XMLModifier xm = new XMLModifier();
        AutoPilot ap = new AutoPilot();
        AutoPilot ap2 = new AutoPilot();
        ap.selectXPath("(/*/*/*)[position()>1 and position()<4]");
        ap2.selectXPath("/*/*/*");
        if (vg.parseFile("soap2.xml",true)){
            VTDNav vn = vg.getNav();
            xm.bind(vn);
            ap2.bind(vn);
            ap.bind(vn);
            ap2.evalXPath();
            ElementFragmentNs ef = vn.getElementFragmentNs();
            int i = -1;
            while((i=ap.evalXPath())!=-1){
                xm.insertAfterElement(ef);
            }           
            xm.output(new FileOutputStream("new_soap.xml"));
        }        
    }
}

The C# version

using System;
using com.ximpleware;

namespace FragmentTest
{
    public class FragmentTest
    {
        public static void Main(String[] args)
        {
            // instantiate VTDGen and XMLModifier
            VTDGen vg = new VTDGen();
            XMLModifier xm = new XMLModifier();
            AutoPilot ap = new AutoPilot();
            AutoPilot ap2 = new AutoPilot();
            ap.selectXPath("(/*/*/*)[position()>1 and position()<4]");
            ap2.selectXPath("/*/*/*");
            if (vg.parseFile("soap2.xml", true))
            {
                VTDNav vn = vg.getNav();
                xm.bind(vn);
                ap2.bind(vn);
                ap.bind(vn);
                ap2.evalXPath();
                ElementFragmentNs ef = vn.getElementFragmentNs();
                int i = -1;
                while ((i = ap.evalXPath()) != -1)
                {
                    xm.insertAfterElement(ef);
                }
                xm.output("new_soap.xml");
            }
        }
    }
}

The C version

#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <io.h>
#include "xpath1.h"
#include "helper.h"
#include "vtdGen.h"
#include "vtdNav.h"
#include "autoPilot.h"
#include "XMLModifier.h"
#include "nodeRecorder.h"
#include "bookMark.h"

struct exception_context the_exception_context[1];

void main(){
	exception e;
	Try{
		VTDGen *vg = NULL; /* This is the VTDGen that parses XML */
		VTDNav *vn = NULL; /* This is the VTDNav that navigates the VTD records */
		AutoPilot *ap = NULL, *ap2=NULL;
		XMLModifier *xm = NULL;
		ElementFragmentNs *ef = NULL;
		int i= -1;
		Long l= -1;

		vg = createVTDGen();
		ap = createAutoPilot2();
		ap2 = createAutoPilot2();
		xm = createXMLModifier();
		selectXPath(ap,L"(/*/*/*)[position()>1 and position()<4]");
		selectXPath(ap2,L"/*/*/*");
		if (parseFile(vg,TRUE,"soap2.xml")){
			//FILE *f1 = fopen("d:/new3.xml","wb");
			vn = getNav(vg);
			bind(ap,vn);
			bind(ap2,vn);
			bind4XMLModifier(xm,vn);
			evalXPath(ap2);
			ef = getElementFragmentNs(vn);
			
			while( (i=evalXPath(ap))!=-1){
				insertAfterElement4(xm,ef);
				printf(" index %d \n",i);
			}
			//fwrite(vn->XMLDoc+vn->docOffset,sizeof(UByte),vn->docLen,f1);
			output2(xm,"new3.xml");
			//fclose(f1);
			free(vn->XMLDoc);
			freeVTDNav(vn);
		}
		freeElementFragmentNs(ef);
		freeXMLModifier(xm);
		freeAutoPilot(ap);
		freeAutoPilot(ap2);
		freeVTDGen(vg);
		
	}Catch(e){
		printf("exception !!!!!!!!!!! \n");
	}
}

Using VTD-XML to Replace Element Names

This code example shows you how to replace the element name of an XML document using XPath and XMLModifier in VTD-XML. The key is to combine XPath and XMLModifier’s updateElementName() at the cursor node.

The Java version is below:

/*
 * Change all elements to lalalala
 */
import com.ximpleware.*;

public class changeElementName {

    public static void main(String[] args) throws Exception{
        
        String xml = "<aaaa> <bbbbb> <ccccc> </ccccc> <ccccc/> <ccccc></ccccc> </bbbbb> </aaaa>";
        VTDGen vg = new VTDGen();
        vg.setDoc(xml.getBytes());
        vg.parse(false);
        VTDNav vn = vg.getNav();
        AutoPilot ap = new AutoPilot(vn);
        ap.selectXPath("//*");
        XMLModifier xm = new XMLModifier(vn);
        int i;
        while(ap.evalXPath()!=-1){
            xm.updateElementName("lalalala");
        }
        xm.output("lala.xml");
        
    }
}

The C# version is here:

using System;

using com.ximpleware;
using System.IO;
namespace FragmentTest
{
    public class FragmentTest
    {
        public static void Main(String[] args)
        {

            String xml = "<aaaa> <bbbbb> <ccccc> </ccccc> <ccccc/> <ccccc></ccccc> </bbbbb> </aaaa>";
            Encoding eg = Encoding.GetEncoding("utf-8");
            VTDGen vg = new VTDGen();
            vg.setDoc(eg.GetBytes(xml));
            vg.parse(false);
            VTDNav vn = vg.getNav();
            AutoPilot ap = new AutoPilot(vn);
            ap.selectXPath("//*");
            XMLModifier xm = new XMLModifier(vn);
            while (ap.evalXPath() != -1)
            {
                xm.updateElementName("d:/lalalala");
            }
            xm.output("lala.xml");
        }
    }
}
Follow

Get every new post delivered to your Inbox.