DocumentCode :
3258989
Title :
Automatic Keyword Extraction Using Linguistic Features
Author :
Hu, Xinghua ; Wu, Bin
Author_Institution :
Baskin Sch. of Eng., California Univ., Santa Cruz, CA
fYear :
2006
fDate :
Dec. 2006
Firstpage :
19
Lastpage :
23
Abstract :
This paper describes a novel keyword extraction algorithm position weight (PW) that utilizes linguistic features to represent the importance of the word position in a document. Topical terms and their previous-term and next-term co-occurrence collections are extracted. To measure the degree of correlation between a topical term and its co-occurrence terms, three methods are employed including term frequency inverse term frequency (TFITF), position weight inverse position weight (PWIPW), and CHI-square (chi2). The co-occurrence terms that have the highest degree of correlation and exceed a co-occurrence frequency threshold are combined together with the original topical term to form a final keyword. With the linear computational complexity of the algorithm, the vector space of documents in a large corpus or boundless Web can be quickly represented by sets of keywords, which makes it possible to retrieve large-scale information fast and effectively
Keywords :
computational complexity; data mining; document handling; information retrieval; CHI-square; automatic keyword extraction; boundless Web; cooccurrence collections; cooccurrence frequency threshold; cooccurrence terms; large corpus; linear computational complexity; linguistic features; position weight inverse position weight; term frequency inverse term frequency; topical terms; vector space; word position; Computational complexity; Content based retrieval; Data mining; Feature extraction; Frequency measurement; Information retrieval; Large-scale systems; Position measurement; Vectors; World Wide Web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops, 2006. ICDM Workshops 2006. Sixth IEEE International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2702-7
Type :
conf
DOI :
10.1109/ICDMW.2006.36
Filename :
4063591
Link To Document :
بازگشت