DocumentCode :
2844104
Title :
Agglomeration and Elimination of Terms for Dimensionality Reduction
Author :
Ciarelli, Patrick Marques ; Oliveira, Elias
Author_Institution :
Dept. of Electr. Eng., Univ. Fed. do Espirito Santo, Vitoria, Brazil
fYear :
2009
fDate :
Nov. 30 2009-Dec. 2 2009
Firstpage :
547
Lastpage :
552
Abstract :
The vector space model is the usual representation of texts database for computational treatment. However, in such representation synonyms and/or related terms are treated as independent. Furthermore, there are some terms that do not add any information at all to the set of text documents, on the contrary they even might harm the performance of the information retrieval techniques. In an attempt to reduce this problem, some techniques have been proposed in the literature. In this work we present a method to tackle this problem. In order to validate our approach, we carried out a series of experiments on four databases and we compare the achieved results with other well known techniques. The evaluation results is such that our method obtained in all cases a better or equal performance compared to the other literature techniques.
Keywords :
database management systems; information retrieval; text analysis; computational treatment; dimensionality reduction; information retrieval techniques; representation synonyms; text documents; texts database; vector space model; Costs; Data mining; Deductive databases; Feature extraction; Frequency; Information retrieval; Information science; Intelligent systems; Spatial databases; Text categorization; agglomeration of terms; dimensionality reduction; feature selection; text classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Systems Design and Applications, 2009. ISDA '09. Ninth International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-1-4244-4735-0
Electronic_ISBN :
978-0-7695-3872-3
Type :
conf
DOI :
10.1109/ISDA.2009.9
Filename :
5364970
Link To Document :
بازگشت