DocumentCode :
1628278
Title :
Segmentation of Publication Records of Authors from the Web
Author :
Zhang, Wei ; Yu, Clement ; Smalheiser, Neil ; Torvik, Vetle
Author_Institution :
University of Illinois at Chicago
fYear :
2006
Firstpage :
120
Lastpage :
120
Abstract :
Publication records are often found in the authors’ personal home pages. If such a record is partitioned into a list of semantic fields of authors, title, date, etc., the unstructured texts can be converted into structured data, which can be used in other applications. In this paper, we present PEPURS, a publication record segmentation system. It adopts a novel "Split and Merge" strategy. A publication record is split into segments; multiple statistical classifiers compute their likelihoods of belonging to different fields; finally adjacent segments are merged if they belong to the same field. PEPURS introduces the punctuation marks and their neighboring texts as a new feature to distinguish different roles of the marks. PEPURS yields high accuracy scores in experiments.
Keywords :
Computer architecture; Computer science; Data engineering; Databases; Distributed computing; Psychiatry;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2006. ICDE '06. Proceedings of the 22nd International Conference on
Print_ISBN :
0-7695-2570-9
Type :
conf
DOI :
10.1109/ICDE.2006.137
Filename :
1617488
Link To Document :
بازگشت