Title :
Data Preprocessing in SVM-Based Keywords Extraction from Scientific Documents
Author :
Wu, Chunguo ; Marchese, Maurizio ; Wang, Yufei ; Krapivin, Mikalai ; Wang, Chaoyong ; Li, Xitong ; Liang, Yanchun
Abstract :
Scientific documents are unstructured data consisting of natural language and hard for scientists to read and manage. Keywords are very helpful for scientists to search the related documents and know about their contents in a prompt way. In this paper we investigate a kind of data preprocessing technique used in SVM-based keyword extraction from scientific documents. Four definitions of regular scientific documents are proposed, and the analysis on the experimental results is performed based on the proposed definitions. The experimental results confirm the intuition that abstract is important for keywords extraction.
Keywords :
data analysis; document handling; information retrieval; support vector machines; SVM based keywords extraction; data preprocessing; scientific document; Automatic control; Chaos; Computer science; Data engineering; Data mining; Data preprocessing; Educational institutions; Equations; Frequency; Natural languages;
Conference_Titel :
Innovative Computing, Information and Control (ICICIC), 2009 Fourth International Conference on
Conference_Location :
Kaohsiung
Print_ISBN :
978-1-4244-5543-0
DOI :
10.1109/ICICIC.2009.155