DocumentCode :
2184096
Title :
Annotating text segments in documents for search
Author :
Cheng, Pu-Jen ; Chiao, Hsin-Chen ; Pan, Yi-Cheng ; Chien, Lee-Feng
Author_Institution :
Inst. of Inf. Sci., Acad. Sinica, Taiwan
fYear :
2005
fDate :
19-22 Sept. 2005
Firstpage :
317
Lastpage :
320
Abstract :
It has been shown that annotating prominent text patterns contained in documents with appropriate types may benefit many applications. Most conventional tools for automatic text annotation extract named entities from texts and annotate them with information about persons, locations, dates and so on. However, this kind of entity type information is often short in length and is mostly limited to a small set of broader categories. In this paper, we try to remedy this problem by presenting an approach to extract global evidences from documents for improved named entity recognition. We also propose an unsupervised, generalized classification approach that collects training data from the Web automatically and classifies text patterns into more refined categories. Experimental results show the feasibility of the proposed approaches for search on the data of the NTCIR-2 information retrieval task.
Keywords :
Internet; text analysis; NTCIR-2 information retrieval task; World Wide Web; automatic text annotation; named entity recognition; text pattern classification; text segment annotation; training data; unsupervised classification; Books; Data mining; Information management; Information retrieval; Information science; Infrared detectors; Noise robustness; Text categorization; Text recognition; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on
Print_ISBN :
0-7695-2415-X
Type :
conf
DOI :
10.1109/WI.2005.32
Filename :
1517864
Link To Document :
بازگشت