Title of article
Document features selection using background knowledge and word clustering technique
Author/Authors
Farahmand، Hajar نويسنده Department of computer engineering, Science and Research Branch, Islamic Azad University, Bushehr, Iran , , Harounabadi ، Ali نويسنده , , Mirabedini، S. Javad نويسنده Department of computer engineering, Islamic Azad University, Central Tehran branch, Iran ,
Issue Information
ماهنامه با شماره پیاپی 26 سال 2014
Pages
10
From page
241
To page
250
Abstract
By everyday development of storage and communicational and electronic media, there are significant amount of information being collected and stored in different forms such as electronic documents and document databases makes it difficult to process them, properly. To extract knowledge from this large volume of documental data, we require the use of documents organizing and indexing methods. Among these methods, we can consider clustering and classification methods where the objective is to organize documents and to increase the speed of accessing to required information. In most of document clustering methods, the clustering is mostly executed based on word frequency and considering document as a bag of words. In this essay, in order to decrease the number of features and to choose basic document feature, we use background knowledge and word clustering methods. In fact by using WordNet ontology, background knowledge and clustering method, the similar words of documents are clustered and the clusters with the number of words more than threshold are chosen and then their frequency of words is accepted as the effective features of document. The results of this proposed method simulation shows that the documents dimensions are decreased effectively and consequently the performance of documents clustering is increased.
Journal title
Management Science Letters
Serial Year
2014
Journal title
Management Science Letters
Record number
981923
Link To Document