What is New in 2.11

Version 2.11, simultaneously available  in C, Java, C++, and C#, is the latest release of VTD-XML. So what is new? The shortly answer: (1) It is more standards-compliant by conforming strictly to XPath 1.0 spec’s notion of node(). (2) It  introduces major performance improvement for XPath expressions involving simple position index.(3)This release introduces major performance improvement for XPath expression containing complex predicates involving  absolute location path expressions. (4) It also contains various bug releases as reported by VTD-XML users.

Change to Node() Interpretation

Before 2.11, node() in a location step in an XPath expression will be interpreted as equivalent to *, i.e., an element node with any name. With 2.11 the same node() will be interpreted either one of “element(), text(), comment(), or processing-instruction(), as defined by XPath 1.0 spec.

Performance Improvement for Simple Position Index

A quick example is “a[2]/b[1].”  A simple position index is basically a constant index value in predicate. 2.11’s XPath engine is now smart enough to detect this use case and allow for early escaping from the execution loop, resulting in faster execution performance. The amount of improvement depends on how frequent the simple index is used in each location step. In some cases, a 50% to 70% execution speedup is possible.

Performance Improvement for Predicates Containing Absolute Path Expressions

A quick example is //a[//abc/@val=’1′]. Notice that predicate contains //abc, which is an absolute path expression. Before 2.11, this expression will trigger repetitive evaluation of //abc to determine whether the predicate is true or false.  The processing cost would increase rapidly with respect to the size of the document. This release would intelligently cache the evaluation result so the corresponding XPath  is evaluated only once. Please notice that this feature is enabled by default, if you can turn it off (we don’t recommend it)  by invoking AutoPilot’s enableCaching’s method and give it a “false.”

How much of an improvement can you expect to see? Depending on size of documents, complexity of predicates and other things. Sometime you will be achieve astonishing results. Consider the following expression.

//CDResults[../../../TargetName/@Value=”//SiteInformation[“TargetName/@Value!=//SiteInformation[1]/TargetName/@Value and TargetName/@Value!=//SiteInformation[TargetName/@Value!=//SiteInformation[1]/TargetName/@Value”][1]/TargetName/@Value][1]/TargetName/@Value]/BottomCD/@Value

Running this document on a 22MB xml document  in Java would take many hours in virtually all XPath implementation including 2.10 version of VTD-XML. With 2.11, it took less than 5 seconds on a commodity, 3 year old PC.

Bug fixes

There are other bug fixes, covering XMLModifier’s deletion capabilities and permissiveness of deletion of sub-nodes.

Advertisements

9 comments so far

  1. sincang on

    Hi,

    During my testing, I have learned that AutoPilot does not support Xpath that starts with predicate. Technically, the “[“. Is this true?

    Thanks,

    • jimmyzhang on

      it seems not a valid xpath

  2. Andrei on

    Hi,

    I need to parse and modify an XML that has attribute values with lengths longer than 2^30 -1 (token max length). Now the VTDGen throws “Token Length Error: Attr val too long”.
    Is there any way that I can parse this XML? I saw some comments on the VTD official site that one can “add another 32 bit if 64 bits are not enough”. How is this possible?
    Or is it possible that the attribute value to be stored in multiple tokens (like it is done for text)?

    Thanks.

    • jimmyzhang on

      I am sorry to hear that but it is a limitation that you have to deal with ( I am referring to token length for attr value being limited to 2^20-1…)

  3. Andrei on

    Hi jimmyzhang,

    Thanks for your answer, so there is no other workaround this limitation? Like using multiple tokens for an attribute value? Or using a 96 bit token?

    Thanks.

    • jimmyzhang on

      It won’t be an easy fix… you don’t seem to use attr the right way: attr val should never to very long.

      • Andrei on

        The xmls are not created by me, I just need to parse them and modify them. I have to deal with them as they are. Please let me know if any solution comes up.
        Thanks for the help.

  4. Mattias on

    I seams that VTDNav.toNormalizedString can not handle tags that are empty self closed. See this examle:
    http://www2.freefarm.se/EmptyTagTest.java
    Will you do a bug fix for that, or is there a work around?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: