VTD-XML 2.8 Released!

Below are the improvements and enhancements in v 2.8 (which can be downloaded here)

  • Expansion of Core VTD-XML API
    • VTDGen adds support for capturing white spaces
    • VTDNav adds support for suport for getContentFragment(), recoverNode() and cloneNav()
    • XMLModifier adds support for update and reparse feature
    • AutoPilot adds support for retrieving all attributes
    • BookMark is also enhanced.
  • Expansion of Extended VTD-XML API
    • Add content extraction ability to extended VTD-XML
    • VTDNavHuge now can call getElementFragment() and getElementFragmentNs()
    • VTDGenHuge adds support for capturing white spaces
  • XPath
    • Adds comment and processing instruction support for nodes, and performance enhancement
    • Adds namespace axis support …
    •  Adds round-half-to-even()
  • A number of bug fixes and code enhancement
Advertisements

8 comments so far

  1. SoIamAsking on

    Is there already XML-Schema support at horizon visible?

    • jimmyzhang on

      working on it and XSLT…

    • jimmyzhang on

      if you know someone willing to sponsor, the pace of development would accelerate no doubt about it 🙂

  2. Zhongda Zhao on

    Is there a known size of the biggest XML file that could be parsed by VDTGen on a 32-bit machine?

    The memory usage mark: 1.3x ~ 1.5x of the XML file doesn’t seem to be exact.

    My trying to parse a 465m XML file failed. I have no knowledge of the limit of FileInputStream and FileChannel.map, haven’t found any sound answer either, given that no problem by allocating a byte array of size 1G.

    Would be very grateful for any explanation.

    • jimmyzhang on

      The max file size is 2 GByte, in reality, if there is enouh memory, 1.4 GB file size is achievable…
      Did you adjust the JVM max heap size?
      java -Xmx1000m ….

      • Zhongda Zhao on

        The heap size is already adjusted as following:

        -Xms1440m -Xmx1440m -XX:MaxPermSize=32m

        Allocating 1G byte array is no problem, but the VDTGen cannot parse that XML file, the exception is thrown by FileInputStream.read(byte[]). I also tried to rewrite parseFile using FileChannel.map, didn’t work either. The limitation may be there, or I am wrong, any other clues? Thanks.

  3. Zhongda Zhao on

    It seems that the limit for FileInputStream.read() and FileChannel.read is about 108M. I adjusted the parseFile method using partial read, now it works. Following is the modification, only for reference, still need to be refined and tested:

    @Override
    public boolean parseFile(String fileName, boolean ns) {
    ByteBuffer bb = ByteBuffer.allocate(1024 * 1024 * 108);
    RandomAccessFile raf = null;
    try {
    raf = new RandomAccessFile(fileName, “r”);
    FileChannel fc = raf.getChannel();
    long len = fc.size();
    if (len > Integer.MAX_VALUE) {
    throw new IllegalStateException(“cannot handle file larger than ”
    + Integer.MAX_VALUE + ” bytes”);
    }

    int size = (int) len;
    byte[] ba = new byte[size];
    int read = 0;
    while (read < size) {
    bb.clear();
    fc.read(bb);
    System.arraycopy(bb.array(), 0, ba, read, bb.position());

    read += bb.position();
    }

    this.setDoc(ba);
    this.parse(ns); // set namespace awareness to true
    return true;
    } catch (java.io.IOException e) {
    } catch (ParseException e) {
    }
    finally {
    if (raf != null) {
    try {
    raf.close();
    } catch (Exception e) {
    }
    }
    }
    return false;
    }

  4. Zhongda Zhao on

    of course the environment:
    XP SP3
    Intel(R) Core(TM)2 Duo CPU E8400@3.00GHz
    1.97GHz, 3.25GB RAM


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: