Using VTD-XML to Replace Element Names

This code example shows you how to replace the element name of an XML document using XPath and XMLModifier in VTD-XML. The key is to combine XPath and XMLModifier’s updateElementName() at the cursor node.

The Java version is below:

/*
 * Change all elements to lalalala
 */
import com.ximpleware.*;

public class changeElementName {

    public static void main(String[] args) throws Exception{
        
        String xml = "<aaaa> <bbbbb> <ccccc> </ccccc> <ccccc/> <ccccc></ccccc> </bbbbb> </aaaa>";
        VTDGen vg = new VTDGen();
        vg.setDoc(xml.getBytes());
        vg.parse(false);
        VTDNav vn = vg.getNav();
        AutoPilot ap = new AutoPilot(vn);
        ap.selectXPath("//*");
        XMLModifier xm = new XMLModifier(vn);
        int i;
        while(ap.evalXPath()!=-1){
            xm.updateElementName("lalalala");
        }
        xm.output("lala.xml");
        
    }
}

The C# version is here:

using System;

using com.ximpleware;
using System.IO;
namespace FragmentTest
{
    public class FragmentTest
    {
        public static void Main(String[] args)
        {

            String xml = "<aaaa> <bbbbb> <ccccc> </ccccc> <ccccc/> <ccccc></ccccc> </bbbbb> </aaaa>";
            Encoding eg = Encoding.GetEncoding("utf-8");
            VTDGen vg = new VTDGen();
            vg.setDoc(eg.GetBytes(xml));
            vg.parse(false);
            VTDNav vn = vg.getNav();
            AutoPilot ap = new AutoPilot(vn);
            ap.selectXPath("//*");
            XMLModifier xm = new XMLModifier(vn);
            while (ap.evalXPath() != -1)
            {
                xm.updateElementName("d:/lalalala");
            }
            xm.output("lala.xml");
        }
    }
}
Advertisements

29 comments so far

  1. Mattias on

    Is there a way to get XMLModifier to work with VTDNavHuge?

    • jimmyzhang on

      Not yet, do you have to use vtdNavHuge? how big is your document?

  2. Mattias on

    We strip off binary data from XML-messages, send the remaining XML to further processing and store the extracted binary data as a BLOB in a database. Our XML-documents are about 0.5 GB, but we have many parallel threads and RAM memory is a concern. Using VTDGenHuge.MEM_MAPPED solves the memory problem. But then we can’t modify the document since VTDNavHuge. getXML().getBytes() is always null and XMLModifierHuge does not exist yet.

    The work around we are planning for is to get all the index from
    while(ap.evalXPath()!=-1)
    long[] l= vn.getElementFragment();
    and then reread the inputstream and direct content to two different outputstreams based on the offset and length data from getElementFragment. But using VTDNavHuge. getXML().getBytes() or XMLModifierHuge would have been a cleaner solution.

    When is XMLModifierHuge coming?

    • jimmyzhang on

      i don’t think u need to use vtdNavHuge for ur use case. Standard vtd-xml supports documents up to 2 GB and you can easily use xmlModifier for your app, (your doc is only 0.5GB, well wihin the limit)

  3. Mattias on

    Here is a new problem. I get a NullPointerException from com.ximpleware.XMLModifier.output(XMLModifier.java:1708)
    when I use a document that is 512 MB or more. Test class provided:

    // Use java -Xmx4g -Xms1g MTest
    // or you get an OutOfMemoryException.
    // Use a computer with >4G RAM for this test.

    import java.io.*;
    import com.ximpleware.*;

    public class MTest {
    public static void main(String[] args) throws Exception{
    StringBuilder sb = new StringBuilder();
    String startTag=”“;
    String endTag=”
    “;
    int tags = startTag.length() + endTag.length();
    System.out.println(“Length of tags: “+tags); // 7
    int length = 512*1024*1024 -tags ; // this does _not_ work
    // int length = 512*1024*1024 -tags -1; // this works. Note -1
    sb.append(startTag);
    for (int i = 0; i < length; i++) {
    sb.append("a");
    }
    sb.append(endTag);
    byte[] b=sb.toString().getBytes();
    VTDGen vg = new VTDGen();
    AutoPilot ap = new AutoPilot();
    ap.selectXPath("/a");
    XMLModifier xm = new XMLModifier();
    vg.setDoc(b);
    vg.parse(true);
    VTDNav vn = vg.getNav();
    ap.bind(vn);
    xm.bind(vn);
    while (ap.evalXPath() != -1) {
    long l = vn.getContentFragment();
    xm.insertAfterElement("b“);
    xm.remove();
    }
    ByteArrayOutputStream bout = new ByteArrayOutputStream();
    xm.output(bout); // Exeption here!
    // NullpointerException at
    //com.ximpleware.XMLModifier.output(XMLModifier.java:1708)
    // when document is 512 MB or bigger.

    System.out.println(bout.toString());
    // when document is <512 MB output is b
    }
    }

    • jimmyzhang on

      working on it, will get back

  4. Mattias on

    Formatting did not work in the previous post, and xml-tags were removed. This is a compleat code listing:

    http://www2.freefarm.se/MTest.java

    • jimmyzhang on

      Hi, what u discovered is an undocumented limittion of remove, you can’t delete a block larger than 512M chars in length. This may get a fix in the next release, is it a problem for u?

  5. Mattias on

    Yes, it is a big problem.

    I am now looking at the surcecode, but I can’t figure out why there is a limit on 512 MB. Do you understand it? Is there a hot-fix while waiting for the next release?

    • jimmyzhang on

      Can you explain why it is a big problem? I may be able to suggest work around…

  6. jimmyzhang on

    I know why it behaves that way… may be able to fix it qik… but this limitation apply to segmented insertion as well…

  7. Mattias on

    We are removing large base64 encoded binary data from XML-messages. We do not need to insert large blocks. The removed binary data is saved to a separate database and in the XML file an ID-number is inserted instead as a reference to the binary data in the database. It is a simple task, but it must be able to handle removal of ~800 MB data.

    We could handle this in different ways. For example we could use VTD to find the start/end-tag of the binary data in the XML-message, open up two streams and write the start and end part of the XML to one stream and the binary data to the other. But we like the high level interface that XMLModifier gives us. The code will be easier to understand, less error prone etc.

    An updated version of XMLModifier that can handle remove of >512 MB would be highly appreciated!!!

    Best regards / Mattias
    mattias@freefarm.se

    • jimmyzhang on

      it is a relative simple fix should get back to u on tht soon, but will not put into release anytime soon, only u will hve it… hope tht is ok the key is to chop large > 512 mb blocks into multiple blocks no bigger than 512 mb

    • jimmyzhang on

      Mattias, can u replace teh removeContent () method in the XMLModifier source code with the code below? it seems to work on my side with your MTest.java. Let me know if it works for u or not.
      public void removeContent(int offset, int len) throws ModifyException{

      if (offset md.docLen
      || offset + len > md.docOffset + md.docLen){
      throw new ModifyException(“Invalid offset or length for removeContent”);
      }
      if (deleteHash.isUnique(offset)==false)
      throw new ModifyException(“There can be only one deletion per offset value”);
      while(len > (1<<29)-1){
      flb.append(((long)((1<<29)-1))<<32 | offset | MASK_DELETE);
      fob.append((Object)null);
      len -= (1<<29)-1;
      offset += (1<<29)-1;
      }
      flb.append(((long)len)<<32 | offset | MASK_DELETE);
      fob.append((Object)null);
      }

  8. Mattias on

    There is a syntax error in the first if-statement. I think it is because you published the source code on a blog, some characters have automatically been removed by the web publishing system. But I commented the whole first if-statement away. Then it works perfect. Thank you!!! Me and my colleagues are really impressed!

    What is the correct syntax of the first if-statement?

    Will this be part of the next or a future VTD-XML release? That would be great for maintenance reasons.

    • jimmyzhang on

      I checked the change into CVS also tried the code block. Will appear in the next release, one way or the other it will be maintained

      public void removeContent(int offset, int len) throws ModifyException{

      if (offset < md.docLen
      || offset + len > md.docOffset + md.docLen){
      throw new ModifyException("Invalid offset or length for removeContent");
      }
      if (deleteHash.isUnique(offset)==false)
      throw new ModifyException("There can be only one deletion per offset value");
      while(len > (1<<29)-1){
      flb.append(((long)((1<<29)-1))<<32 | offset | MASK_DELETE);
      fob.append((Object)null);
      len -= (1<<29)-1;
      offset += (1<<29)-1;
      }
      flb.append(((long)len)<<32 | offset | MASK_DELETE);
      fob.append((Object)null);
      }

  9. Mattias on

    Is there an error in the first part of the first if-statement (perhaps due to the web publishing system messing up the code)?
    Now it is:
    offset [less than] md.docLen
    But I think it should be:
    offset [greater than] md.docLen

    (I use [greater than] instead of > because I don’t think that [less than] sign will show in this blog.)

  10. Mattias on

    Just testing if (less than sign) < will show.

  11. Mattias on

    OK, it did. So I think this.
    if (offset md.docOffset + md.docLen){
    throw new ModifyException(“Invalid offset or length for removeContent”);
    }

    should be:
    if (offset > md.docLen || offset + len > md.docOffset + md.docLen){
    throw new ModifyException(“Invalid offset or length for removeContent”);
    }

    Right?

  12. Mattias on

    Ok, now in the above post the less than sign md.docLen || offset + len > md.docOffset + md.docLen){
    throw new ModifyException(“Invalid offset or length for removeContent”);
    }

  13. Mattias on

    What is wrong with this comment field? My posts are not showing correctly. But I hope I got my ideas through buy now regarding the error in the first if-statement.

  14. Mattias on

    Sorry, I think I understand now. There is both a less than and a greater than sign in the if-statement. The content in between is interpreted as a HTML-tag and removed by the web publishing system. So the first if-statement should not be changed for the original:

    if (offset [less than] md.docOffset || len [grater than] md.docLen || offset + len [greater than] md.docOffset + md.docLen)

    • jimmyzhang on

      do u still have problems/questions?

  15. Mattias on

    No problems, no questions.
    Now it works perfectly.

  16. Narenbala on

    Hi,

    I need to transform a xml document which might be more than 2GB. Is there a solution for this in VTD-XML. Currently i use V2.11 and there doesnt seem to be support for VTDNavHuge for XMLModifier. Let me know if there is alternate solution.

    • jimmyzhang on

      Hi, Extended vtd-xml doesn’t support xmlModifier yet, the reason is that it is somewhat rare for someone to work on documents more than 1.4 GB which vtd-xml supports, but we can probably come up something similar for the upcoming release I guess

      • Narenbala on

        Thanks for the reply. My case is that i need to transform all the elements and attributes in a xml file to either small or upper case so that i can run the xsd against the file and validate it. Is there any alternate solution in vtd-xml for the above scenario ? ( Note: The xml files will be more than 2GB )

  17. Narenbala on

    Thanks for the reply. My case is that i need to transform all the elements and attributes in a xml file to either small or upper case so that i can run the xsd against the file and validate it. Is there any alternate solution in vtd-xml for the above scenario ?

    • jimmyzhang on

      Are those individual xml files bigger than 2gb? How urgent do you need something to work?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: