Archive for the ‘code’ Tag

How to remove comment nodes from an XML document?

In this post I am going to show you how to effectively remove comments from an XML document using the combination of XMLModifier and XPath. The input XML document looks like the following.

<?xml version="1.0"?><!-- some other code here -->
<clients>
<!-- some other code here -->

<function>
</function>

<function>
</function>

<function>
<name>data_values</name>
<variables>
<variable><!-- some other code here -->
<name>temp</name>
<!-- some other code here --> <type>double</type>
</variable>
</variables><!-- some other code here -->
<block><!-- some other code here -->
<opster>temp = 1</opster>
</block>
</function>
</clients>

The code that performs the task is listed below. The key is the XPath expression “//comment()” which selects all the comment nodes in the document. After binding VTDNav object to the XMLModifier object, you can simply call the “remove()” method, which will not only remove the content of the comment, but also the surrounding delimiting text (i.e. <!–, and –>).


import com.ximpleware.*;
import java.io.*;
public class removeNodesDemo {

public static void main(String[] args) throws VTDException, IOException{
VTDGen vg = new VTDGen();
if (!vg.parseFile("quot;d:\\xml\\input2.xml",false))
return;
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
XMLModifier xm = new XMLModifier(vn);
ap.selectXPath(&amp;quot;//comment()&amp;quot;);

int i=0;
while((i=ap.evalXPath())!=-1){
xm.remove();
}
xm.output("d:\\xml\\output2.xml");
}
}

The output XML is

<clients>


<function>
</function>

<function>
</function>

<function>
<name>data_values</name>
<variables>
<variable>
<name>temp</name>
<type>double</type>
</variable>
</variables>
<block>
<opster>temp = 1</opster>
</block>
</function>
</clients>

You might ask what if I want to remove an attribute node,  a text node, or a CDATA node, an element node, or an processing instruction node?

The effective of XMLModifer’s remove() method has the following effect on each type of nodes:

  • On a non-CDATA text node, it will simply remove it
  • On a CDATA typed text node, it will remove the text content and surrounding delimiting texts
  • On an element node, it remove the entire fragment of it
  • On an attribute node, it will remove both the attribute name value pair in its entirety.
  • On a processing instruction node, it will remove both the content and the surrounding delimiting text.

In other words, to remove all processing instruction nodes, just substitute the XPath expression above with “//processing-instruction().”

Advertisements