Whitespace Trimming in 2.12

This blog shows an example of using vtd-xml 2.12’s latest methods to remove the leading and trailing white spaces of the text nodes in an XML document.

 

This is the input document. And as you can easily see, ID’s text nodes have long trailing white spaces.

<?xml version="1.0"?>
<ns:myOrder xmlns:ns="http://w3schools.com/BusinessDocument" xmlns:ct="http://something.com/CommonTypes">
    <MessageHeader>
        <ct:ID>i7         </ct:ID>
        <ct:ID>i7         </ct:ID>
        <ct:ID>i7         </ct:ID>
        <ct:ID>i7         </ct:ID>
        <ct:Name> Company Name    </ct:Name>
    </MessageHeader>
</ns:myOrder>

This the output XML file, with all trailing white spaces removed.

<?xml version="1.0"?>
<ns:myOrder xmlns:ns="http://w3schools.com/BusinessDocument" xmlns:ct="http://something.com/CommonTypes">
    <MessageHeader>
        <ct:ID>i7</ct:ID>
        <ct:ID>i7</ct:ID>
        <ct:ID>i7</ct:ID>
        <ct:ID>i7</ct:ID>
        <ct:Name>Company Name</ct:Name>
    </MessageHeader>
</ns:myOrder>

Below is the java code that does the white spaces removal.


import com.ximpleware.*;
public class removeWS {

       public static void main(String[] s) throws VTDException, Exception{
             VTDGen vg = new VTDGen();
             AutoPilot ap = new AutoPilot();
             XMLModifier xm = new XMLModifier();
             if (vg.parseFile("d:\\xml2\\ws.xml", true)){
                    VTDNav vn = vg.getNav();
                    ap.bind(vn);
                    xm.bind(vn);
                    ap.selectXPath("//text()");
                    int i=-1;
                    while((i=ap.evalXPath())!=-1){
                        int offset = vn.getTokenOffset(i);
                        int len = vn.getTokenLength(i);

                        long l = vn.trimWhiteSpaces((((long)len)<<32)|offset );
System.out.println(" ===> "+vn.toString(i));
                        System.out.println("len ==>"+len+" new len==>"+ (l>>32));
                        int nlen = (int)(l>>32);
                        int nos= (int) l;
                        xm.updateToken(i,vn,nos,nlen);
                    }
             xm.output("d:\\xml2\\new.xml");
          }
      }
}

Advertisements

2 comments so far

  1. Saurabh on

    I am using XMLModifier remove to remove certain content in my XML files.
    the remove works fine but we find some whitespaces at the place content is removed.
    Can anyone provide a way where VTD can remove content from file without whitespaces

    • jimmyzhang on

      Suresh, can you start a question thread on stackoverflow and tag it with vtd-xml.. I will respond there. This will also benefit others having similar questions.. what do you think?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: