Archive for the ‘whitespace’ Tag

White Space Handling in 2.13

In this post I will highlight VTD-XML v2.13’s whitespace handling capability along with some examples in Java.

Quick Review

Native to non-extractive parsing, VTD-XML’s handling of XML tokens and elements frequently revolves around the concept of byte segments.Once an XML document is parsed into VTD tokens, the byte segment enveloping the entire content of any ¬†token or element can be visualized of as a pair of descriptors (i.e. offset and length) projecting into the original document.

For a large class of XML content extraction and modification operations, non-extractive parsing allows applications to circumvent the tedious, cycle-wasting tasks of de-serializing and re-serializing byte content of elements, and thereby help achieving maximum performance possible.

Whitespace handling in 2.12

Version 2.12 of VTD-XML introduces two new methods that help either trim or expand the surrounding white spaces of byte
segments denoted by 64-bit integers.

  • trimWhiteSpaces(long l), of VTDNav class, accepts a byte segment descriptor, removes both the leading and trailing white spaces, and returns a new descriptor.
  • expandWhiteSpaces(long l), of the same VTDNav class, takes a segment descriptor ands returns a new descriptor that includes all the leading and trailing white spaces around the input segment

It is worth noting that both methods are greedy: they will remove/expand as many white spaces as they possibly can. Furthermore, you can make the observation that the effect of one call often negates the other.

Additions in 2.13

Three static constants and three more methods are added to VTDNav class in 2.13.

Those constants are:

  1. VTDNav.WS_LEADING
  2. VTDNav.WS_TRAILING
  3.  VTDNav.WS_BOTH

The two new methods are:

  • trimWhiteSpaces(long l, short actionType) still trims the white spaces around the segment and returns a new segment descriptor.But the trimming operation can now be applied to the leading whitespaces, the trailing ones, or both, depending on the value of actionType.
  • trimWhiteSpaces(int index, short actionType) brings you the power and convenience of trimming a VTD record without the hassle of manually compose a 64-bit segment descriptor.
  • expandWhteSpaces(long l, short actionType) still expands the whitespaces. But the expansion can also be designated to include either the leading whitespaces, the trailing ones, or both.

 

Common Use Cases

Suppose you want to remove some element fragments from the master XML document, but you want the remaining XML text to retain the orginal format, or make slight, fine granular changes to it (ex. paragraph separation, indentation). You can also extract out
a segment of XML bytes without losing its surrounding formatting line breaks or tabs.

An Example


<root>
<name>suresh</name>
<address>Address</address>
</root>

Consider the following example taken from a Q/A thread on StackOverflow web site
(http://stackoverflow.com/questions/36972163/vtd-xmlremoving-the-spaces-after-removing-the-element)

With 2.12, if you want to remove the “<name>suresh</name>” fragment using the following code, you will end up with an XML document


import com.ximpleware.*;
import java.io.*;
public class testExpandSpace {
public static void main(String[] args) throws VTDException,IOException{
// TODO Auto-generated method stub
VTDGen vg = new VTDGen();
AutoPilot ap = new AutoPilot();
XMLModifier xm = new XMLModifier();
if (!vg.parseFile("d://xml//testSuresh.xml",false))
return;
VTDNav vn=vg.getNav();
ap.bind(vn);
xm.bind(vn);
ap.selectXPath("//name");
int index=-1;
while((index=ap.evalXPath())!=-1)
{
System.out.println(" ===> "+vn.toString(index) +"===>");
long elementFragment=vn.getElementFragment();
xm.remove(vn.expandWhiteSpaces(elementFragment));
}
xm.output("d://xml//test1111.xml");
}
}

<root><address>Address</address>
</root>

With 2.13, you can trim off only the trailing space from the “name” fragment before removing it, thereby maintaining the desirable output
indentation

import com.ximpleware.*;
import java.io.*;
public class testExpandSpace {
public static void main(String[] args) throws VTDException,IOException{
// TODO Auto-generated method stub
VTDGen vg = new VTDGen();
AutoPilot ap = new AutoPilot();
XMLModifier xm = new XMLModifier();
if (!vg.parseFile("d://xml//testSuresh.xml",false))
return;
VTDNav vn=vg.getNav();
ap.bind(vn);
xm.bind(vn);
ap.selectXPath("//name");
int index=-1;
while((index=ap.evalXPath())!=-1)
{
System.out.println(" ===> "+vn.toString(index) +"===>");
long elementFragment=vn.getElementFragment();
xm.remove(vn.expandWhiteSpaces(elementFragment,VTDNav.WS_TRAILING));
}
xm.output("d://xml//test1111.xml");
}
}

Below is the output with desirable format.


<root>
<address>Address</address>
</root>