VTD-XML 2.8 Released!
Below are the improvements and enhancements in v 2.8 (which can be downloaded here)
- Expansion of Core VTD-XML API
- VTDGen adds support for capturing white spaces
- VTDNav adds support for suport for getContentFragment(), recoverNode() and cloneNav()
- XMLModifier adds support for update and reparse feature
- AutoPilot adds support for retrieving all attributes
- BookMark is also enhanced.
-
Expansion of Extended VTD-XML API
- Add content extraction ability to extended VTD-XML
- VTDNavHuge now can call getElementFragment() and getElementFragmentNs()
- VTDGenHuge adds support for capturing white spaces
- XPath
- Adds comment and processing instruction support for nodes, and performance enhancement
- Adds namespace axis support …
- Adds round-half-to-even()
- A number of bug fixes and code enhancement
Advertisement
Is there already XML-Schema support at horizon visible?
working on it and XSLT…
if you know someone willing to sponsor, the pace of development would accelerate no doubt about it
Is there a known size of the biggest XML file that could be parsed by VDTGen on a 32-bit machine?
The memory usage mark: 1.3x ~ 1.5x of the XML file doesn’t seem to be exact.
My trying to parse a 465m XML file failed. I have no knowledge of the limit of FileInputStream and FileChannel.map, haven’t found any sound answer either, given that no problem by allocating a byte array of size 1G.
Would be very grateful for any explanation.
The max file size is 2 GByte, in reality, if there is enouh memory, 1.4 GB file size is achievable…
Did you adjust the JVM max heap size?
java -Xmx1000m ….
The heap size is already adjusted as following:
-Xms1440m -Xmx1440m -XX:MaxPermSize=32m
Allocating 1G byte array is no problem, but the VDTGen cannot parse that XML file, the exception is thrown by FileInputStream.read(byte[]). I also tried to rewrite parseFile using FileChannel.map, didn’t work either. The limitation may be there, or I am wrong, any other clues? Thanks.
It seems that the limit for FileInputStream.read() and FileChannel.read is about 108M. I adjusted the parseFile method using partial read, now it works. Following is the modification, only for reference, still need to be refined and tested:
@Override
public boolean parseFile(String fileName, boolean ns) {
ByteBuffer bb = ByteBuffer.allocate(1024 * 1024 * 108);
RandomAccessFile raf = null;
try {
raf = new RandomAccessFile(fileName, “r”);
FileChannel fc = raf.getChannel();
long len = fc.size();
if (len > Integer.MAX_VALUE) {
throw new IllegalStateException(“cannot handle file larger than ”
+ Integer.MAX_VALUE + ” bytes”);
}
int size = (int) len;
byte[] ba = new byte[size];
int read = 0;
while (read < size) {
bb.clear();
fc.read(bb);
System.arraycopy(bb.array(), 0, ba, read, bb.position());
read += bb.position();
}
this.setDoc(ba);
this.parse(ns); // set namespace awareness to true
return true;
} catch (java.io.IOException e) {
} catch (ParseException e) {
}
finally {
if (raf != null) {
try {
raf.close();
} catch (Exception e) {
}
}
}
return false;
}
of course the environment:
XP SP3
Intel(R) Core(TM)2 Duo CPU E8400@3.00GHz
1.97GHz, 3.25GB RAM