DocumentCode :
245132
Title :
Document-Specific Keyphrase Extraction Using Sequential Patterns with Wildcards
Author :
Fei Xie ; Xindong Wu ; Xingquan Zhu
Author_Institution :
Sch. of Comput. Sci. & Inf. Eng., Hefei Univ. of Technol., Hefei, China
fYear :
2014
fDate :
14-17 Dec. 2014
Firstpage :
1055
Lastpage :
1060
Abstract :
Finding good keyphrases for a document is beneficial for many applications, such as text summarization, browsing, and indexing. In this paper, we propose a sequential pattern mining based document-specific keyphrase extraction method. Our key innovation is to use wildcards (or gap constraints) to help extract sequential patterns, where the flexible wildcard constraints within a pattern can capture semantic relationships between words. To achieve this goal, we regard each single document as a sequential dataset, and propose an efficient algorithm to mine sequential patterns with wildcard and one-off conditions that allows important keyphrases to be captured during the mining process. For each extracted keyphrase candidate, we use some statistical pattern features to characterize it. A supervised learning classifier is trained to identify keyphrases from a test document. Comparisons on keyphrase benchmark datasets confirm that our document-specific keyphrase extraction method is effective in improving the quality of extracted keyphrases.
Keywords :
data mining; learning (artificial intelligence); pattern classification; statistical analysis; text analysis; document-specific keyphrase extraction; gap constraints; keyphrase benchmark datasets; keyphrases identification; mining process; semantic relationships; sequential dataset; sequential pattern mining; sequential patterns extraction; statistical pattern features; supervised learning classifier; wildcard constraints; Data mining; Databases; Educational institutions; Feature extraction; Microprogramming; Semantics; Time complexity; classification; keyphrase extraction; sequential pattern mining; wildcards;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
ISSN :
1550-4786
Print_ISBN :
978-1-4799-4303-6
Type :
conf
DOI :
10.1109/ICDM.2014.105
Filename :
7023446
Link To Document :
بازگشت