DocumentCode :
2061327
Title :
A New Approach for Better Document Retrieval and Classification Performance Using Supervised WSD and Concept Graph
Author :
Soltanpoor, Reza ; Mohsenzadeh, Mehran ; Mohaqeqi, Morteza
Author_Institution :
North Branch, Comput. Dept., Islamic Azad Univ., Tehran, Iran
fYear :
2010
fDate :
5-7 Aug. 2010
Firstpage :
32
Lastpage :
38
Abstract :
Word Sense Disambiguation (WSD) is main task in the area of natural language processing (NLP). Supervised WSD methods are shown to be more effective than other WSD methods with the limitation of the size of manual annotated learning set. On the other hand, Concept graph is a weighted graph with each of its edges representing the relationships between concepts (relevancy of each pair of concepts). In this paper, we propose a method to improve the retrieval and classification performance of documents from different sources by means of concept graph. In our method, some features are initially selected from a training set by applying a well-known feature selection algorithm. Then, by injecting suggested relevant words for each class from the concept graph, a more enriched feature set is produced to apply to the test set. Our experimental results exhibit an improvement of 14.6% and 18.4% (few and more term injection evaluations, respectfully) in classification and also some improvements in retrieval performance.
Keywords :
document handling; graph theory; information retrieval; natural language processing; classification performance; concept graph; document retrieval; feature selection; natural language processing; supervised WSD; weighted graph; word sense disambiguation; Classification algorithms; Context; Feature extraction; Indexing; Support vector machine classification; Text categorization; Training; Concept Graph; Feature Selection; Information Retrieval; Naïve; supervised WSD;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Integrated Intelligent Computing (ICIIC), 2010 First International Conference on
Conference_Location :
Bangalore
Print_ISBN :
978-1-4244-7963-4
Electronic_ISBN :
978-0-7695-4152-5
Type :
conf
DOI :
10.1109/ICIIC.2010.63
Filename :
5571516
Link To Document :
بازگشت