The many ways that vtd-xml can help you optimize the performance of your applications

I was asked on stackoverflow the possible options available for VTD-XML to improve XML performance. Below is my answer that I think is useful in sharing with readers of this blog.

There are usually the following ways to optimize performance with VTD-XML:

  1. White space option- You can ask VTDGen to ignore or retain trivial white space characters. By default, VTDGen throws away those trivial white spaces. The difference is mainly in memory usage.
  2. Buffer reuse- You can ask VTDGen to reuse VTD buffers for the next parsing task. Otherwise, by default, VTDGen will allocate new buffer for each parsing run. This optimization technique is most useful if you are processing similar sized XML file, so that the VTD buffer page size remains unchanged across consecutive parsing runs.
  3. Adjust LC level- By default, it is 3. But you can set it to 5. When your XML are deeply nested, setting LC level to 5 results in better XPath performance. But it increases memory usage and parsing time very slightly.
  4. Reuse XPath: Compiling/selecting XPath is a relatively slow operation, especially when you run XPath expression over many small files. The key is to take any AutoPilot.selectXPath() out of loops and reuse them by calling ap.resetXPath().
  5. Use VTD+XML indexing- Instead of parsing XML files at the time of processing request, you can pre-index your XML into VTD+XML format and dump them on disk. When the processing request commences, simply load VTD+xml in memory and voila, parsing is no longer needed!! Read this article for a detailed description of this feature.
  6. The overwrite feature aka. data templating- Because VTD-XML retains XML in memory as is, you can actually create a template XML file (pre-indexed in vtd+xml) whose value fields are left blank and let your app fill in the blank, thus creating XML data that never need to be parsed.

option 1 2 3 and 4 usually improve performance incrementally. option 5 and 6 enable paradigm shift by fundamentally changing the way XML data are generated and consumed and giving you potentially vast performance improvements over existing processing framework and methodology. For one thing, you can easily figure out that the result of xpath evaluation can also be persisted along with VTD index to actually bypass the XPath evaluation. There are just so many ways to improve your apps that i will leave this to your imaginations.

Advertisements

3 comments so far

  1. Oliver M. on

    Hi,

    thanks for highlighting this! Optimization is an important topic because people choose VTD for performance reasons, therefore they must be interested in it. Like me. 🙂

    Concerning (2) Buffer reuse – how is it really done? I’m afraid, I cannot find a method in VTDGen that would allow me to tell that I want to re-use the buffer.

    Thanks for helping me!

    Kind regards

    Oliver

    • jimmyzhang on

      If you have downloaded vtd-xml’s releases, there is a folder called code examples. In it, you should locate a sub-folder called buffer reuse… there are examples that can get your feet wet…

  2. Oliver M. on

    Hi Jimmy,
    thanks for the hint! I found the code examples in the 2.12 release and the solution was so simple that I didn’t see the for for all the trees – vtdGen.setDoc_BR(byte[]) instead of vtdGen.setDoc(byte[])


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: