Title :
An XML subtree segmentation method based on syntactic segmentation rate
Author :
Liang, Wenxin ; Ouyang, Xiangyong ; Yokota, Haruo
Author_Institution :
Japan Science and Technology Agency, Tokyo Institute of Technology, Japan
Abstract :
In this paper, we propose an effective method for segmenting large XML documents into independent meaningful subtrees based on two syntactic segmentation rates: vertical segmentation rate and horizontal segmentation rate. In the proposed method, we use DO-VLEI code to calculate the required parameters for the subtree segmentation. We conduct experiments to observe the effectiveness of the proposed subtree segmentation method using real bibliography XML documents stored in RDBs. We apply our previously proposed subtree matching algorithm SLAX to match the segmented subtrees and evaluate how the matching threshold impacts the precision and recall of subtree matching. Besides, we also integrate the matched subtrees determined by SLAX by our previously proposed subtree integration algorithm. The experimental results indicate that the proposed subtree segmentation method is effective for segmenting XML documents into independent meaningful subtrees and our previously proposed subtree matching algorithm achieves reasonable matching precision and recall using the segmented subtrees.
Keywords :
Bibliographies; Data preprocessing; Internet; Labeling; Large-scale systems; XML;
Conference_Titel :
Digital Information Management, 2007. ICDIM '07. 2nd International Conference on
Conference_Location :
Lyon, France
Print_ISBN :
978-1-4244-1475-8
Electronic_ISBN :
978-1-4244-1476-5
DOI :
10.1109/ICDIM.2007.4444281