Archive for October, 2015|Monthly archive page
I was asked on stackoverflow the possible options available for VTD-XML to improve XML performance. Below is my answer that I think is useful in sharing with readers of this blog.
There are usually the following ways to optimize performance with VTD-XML:
- White space option- You can ask VTDGen to ignore or retain trivial white space characters. By default, VTDGen throws away those trivial white spaces. The difference is mainly in memory usage.
- Buffer reuse- You can ask VTDGen to reuse VTD buffers for the next parsing task. Otherwise, by default, VTDGen will allocate new buffer for each parsing run. This optimization technique is most useful if you are processing similar sized XML file, so that the VTD buffer page size remains unchanged across consecutive parsing runs.
- Adjust LC level- By default, it is 3. But you can set it to 5. When your XML are deeply nested, setting LC level to 5 results in better XPath performance. But it increases memory usage and parsing time very slightly.
- Reuse XPath: Compiling/selecting XPath is a relatively slow operation, especially when you run XPath expression over many small files. The key is to take any AutoPilot.selectXPath() out of loops and reuse them by calling ap.resetXPath().
- Use VTD+XML indexing- Instead of parsing XML files at the time of processing request, you can pre-index your XML into VTD+XML format and dump them on disk. When the processing request commences, simply load VTD+xml in memory and voila, parsing is no longer needed!! Read this article for a detailed description of this feature.
- The overwrite feature aka. data templating- Because VTD-XML retains XML in memory as is, you can actually create a template XML file (pre-indexed in vtd+xml) whose value fields are left blank and let your app fill in the blank, thus creating XML data that never need to be parsed.
option 1 2 3 and 4 usually improve performance incrementally. option 5 and 6 enable paradigm shift by fundamentally changing the way XML data are generated and consumed and giving you potentially vast performance improvements over existing processing framework and methodology. For one thing, you can easily figure out that the result of xpath evaluation can also be persisted along with VTD index to actually bypass the XPath evaluation. There are just so many ways to improve your apps that i will leave this to your imaginations.