Using VTD-XML to Replace Element Names
This code example shows you how to replace the element name of an XML document using XPath and XMLModifier in VTD-XML. The key is to combine XPath and XMLModifier’s updateElementName() at the cursor node.
The Java version is below:
/*
* Change all elements to lalalala
*/
import com.ximpleware.*;
public class changeElementName {
public static void main(String[] args) throws Exception{
String xml = "<aaaa> <bbbbb> <ccccc> </ccccc> <ccccc/> <ccccc></ccccc> </bbbbb> </aaaa>";
VTDGen vg = new VTDGen();
vg.setDoc(xml.getBytes());
vg.parse(false);
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("//*");
XMLModifier xm = new XMLModifier(vn);
int i;
while(ap.evalXPath()!=-1){
xm.updateElementName("lalalala");
}
xm.output("lala.xml");
}
}
The C# version is here:
using System;
using com.ximpleware;
using System.IO;
namespace FragmentTest
{
public class FragmentTest
{
public static void Main(String[] args)
{
String xml = "<aaaa> <bbbbb> <ccccc> </ccccc> <ccccc/> <ccccc></ccccc> </bbbbb> </aaaa>";
Encoding eg = Encoding.GetEncoding("utf-8");
VTDGen vg = new VTDGen();
vg.setDoc(eg.GetBytes(xml));
vg.parse(false);
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("//*");
XMLModifier xm = new XMLModifier(vn);
while (ap.evalXPath() != -1)
{
xm.updateElementName("d:/lalalala");
}
xm.output("lala.xml");
}
}
}
Advertisement
Is there a way to get XMLModifier to work with VTDNavHuge?
Not yet, do you have to use vtdNavHuge? how big is your document?
We strip off binary data from XML-messages, send the remaining XML to further processing and store the extracted binary data as a BLOB in a database. Our XML-documents are about 0.5 GB, but we have many parallel threads and RAM memory is a concern. Using VTDGenHuge.MEM_MAPPED solves the memory problem. But then we can’t modify the document since VTDNavHuge. getXML().getBytes() is always null and XMLModifierHuge does not exist yet.
The work around we are planning for is to get all the index from
while(ap.evalXPath()!=-1)
long[] l= vn.getElementFragment();
and then reread the inputstream and direct content to two different outputstreams based on the offset and length data from getElementFragment. But using VTDNavHuge. getXML().getBytes() or XMLModifierHuge would have been a cleaner solution.
When is XMLModifierHuge coming?
i don’t think u need to use vtdNavHuge for ur use case. Standard vtd-xml supports documents up to 2 GB and you can easily use xmlModifier for your app, (your doc is only 0.5GB, well wihin the limit)
Here is a new problem. I get a NullPointerException from com.ximpleware.XMLModifier.output(XMLModifier.java:1708)
when I use a document that is 512 MB or more. Test class provided:
// Use java -Xmx4g -Xms1g MTest
// or you get an OutOfMemoryException.
// Use a computer with >4G RAM for this test.
import java.io.*;
import com.ximpleware.*;
public class MTest {
public static void main(String[] args) throws Exception{
StringBuilder sb = new StringBuilder();
String startTag=”“;
String endTag=”“;
int tags = startTag.length() + endTag.length();
System.out.println(“Length of tags: “+tags); // 7
int length = 512*1024*1024 -tags ; // this does _not_ work
// int length = 512*1024*1024 -tags -1; // this works. Note -1
sb.append(startTag);
for (int i = 0; i < length; i++) {
sb.append("a");
}
sb.append(endTag);
byte[] b=sb.toString().getBytes();
VTDGen vg = new VTDGen();
AutoPilot ap = new AutoPilot();
ap.selectXPath("/a");
XMLModifier xm = new XMLModifier();
vg.setDoc(b);
vg.parse(true);
VTDNav vn = vg.getNav();
ap.bind(vn);
xm.bind(vn);
while (ap.evalXPath() != -1) {
long l = vn.getContentFragment();
xm.insertAfterElement("b“);
xm.remove();
}
ByteArrayOutputStream bout = new ByteArrayOutputStream();
xm.output(bout); // Exeption here!
// NullpointerException at
//com.ximpleware.XMLModifier.output(XMLModifier.java:1708)
// when document is 512 MB or bigger.
System.out.println(bout.toString());
// when document is <512 MB output is b
}
}
working on it, will get back
Formatting did not work in the previous post, and xml-tags were removed. This is a compleat code listing:
http://www2.freefarm.se/MTest.java
Hi, what u discovered is an undocumented limittion of remove, you can’t delete a block larger than 512M chars in length. This may get a fix in the next release, is it a problem for u?
Yes, it is a big problem.
I am now looking at the surcecode, but I can’t figure out why there is a limit on 512 MB. Do you understand it? Is there a hot-fix while waiting for the next release?
Can you explain why it is a big problem? I may be able to suggest work around…
I know why it behaves that way… may be able to fix it qik… but this limitation apply to segmented insertion as well…
We are removing large base64 encoded binary data from XML-messages. We do not need to insert large blocks. The removed binary data is saved to a separate database and in the XML file an ID-number is inserted instead as a reference to the binary data in the database. It is a simple task, but it must be able to handle removal of ~800 MB data.
We could handle this in different ways. For example we could use VTD to find the start/end-tag of the binary data in the XML-message, open up two streams and write the start and end part of the XML to one stream and the binary data to the other. But we like the high level interface that XMLModifier gives us. The code will be easier to understand, less error prone etc.
An updated version of XMLModifier that can handle remove of >512 MB would be highly appreciated!!!
Best regards / Mattias
mattias@freefarm.se
it is a relative simple fix should get back to u on tht soon, but will not put into release anytime soon, only u will hve it… hope tht is ok the key is to chop large > 512 mb blocks into multiple blocks no bigger than 512 mb
Mattias, can u replace teh removeContent () method in the XMLModifier source code with the code below? it seems to work on my side with your MTest.java. Let me know if it works for u or not.
public void removeContent(int offset, int len) throws ModifyException{
if (offset md.docLen
|| offset + len > md.docOffset + md.docLen){
throw new ModifyException(“Invalid offset or length for removeContent”);
}
if (deleteHash.isUnique(offset)==false)
throw new ModifyException(“There can be only one deletion per offset value”);
while(len > (1<<29)-1){
flb.append(((long)((1<<29)-1))<<32 | offset | MASK_DELETE);
fob.append((Object)null);
len -= (1<<29)-1;
offset += (1<<29)-1;
}
flb.append(((long)len)<<32 | offset | MASK_DELETE);
fob.append((Object)null);
}
There is a syntax error in the first if-statement. I think it is because you published the source code on a blog, some characters have automatically been removed by the web publishing system. But I commented the whole first if-statement away. Then it works perfect. Thank you!!! Me and my colleagues are really impressed!
What is the correct syntax of the first if-statement?
Will this be part of the next or a future VTD-XML release? That would be great for maintenance reasons.
I checked the change into CVS also tried the code block. Will appear in the next release, one way or the other it will be maintained
public void removeContent(int offset, int len) throws ModifyException{
if (offset < md.docLen
|| offset + len > md.docOffset + md.docLen){
throw new ModifyException("Invalid offset or length for removeContent");
}
if (deleteHash.isUnique(offset)==false)
throw new ModifyException("There can be only one deletion per offset value");
while(len > (1<<29)-1){
flb.append(((long)((1<<29)-1))<<32 | offset | MASK_DELETE);
fob.append((Object)null);
len -= (1<<29)-1;
offset += (1<<29)-1;
}
flb.append(((long)len)<<32 | offset | MASK_DELETE);
fob.append((Object)null);
}
Is there an error in the first part of the first if-statement (perhaps due to the web publishing system messing up the code)?
Now it is:
offset [less than] md.docLen
But I think it should be:
offset [greater than] md.docLen
(I use [greater than] instead of > because I don’t think that [less than] sign will show in this blog.)
Just testing if (less than sign) < will show.
OK, it did. So I think this.
if (offset md.docOffset + md.docLen){
throw new ModifyException(“Invalid offset or length for removeContent”);
}
should be:
if (offset > md.docLen || offset + len > md.docOffset + md.docLen){
throw new ModifyException(“Invalid offset or length for removeContent”);
}
Right?
Ok, now in the above post the less than sign md.docLen || offset + len > md.docOffset + md.docLen){
throw new ModifyException(“Invalid offset or length for removeContent”);
}
What is wrong with this comment field? My posts are not showing correctly. But I hope I got my ideas through buy now regarding the error in the first if-statement.
Sorry, I think I understand now. There is both a less than and a greater than sign in the if-statement. The content in between is interpreted as a HTML-tag and removed by the web publishing system. So the first if-statement should not be changed for the original:
if (offset [less than] md.docOffset || len [grater than] md.docLen || offset + len [greater than] md.docOffset + md.docLen)
do u still have problems/questions?
No problems, no questions.
Now it works perfectly.