Title :
Feature reduction using mixture model of directional distributions
Author :
Thang, Nguyen Duc ; Chen, Lihui ; Chan, Chee Keong
Author_Institution :
Sch. of Electr. & Electron. Eng., Nanyang Technol. Univ., Singapore
Abstract :
Text data normally has thousands, or even tens of thousands, of features. This causes the well-known ldquocurse of dimensionalityrdquo in text clustering. Feature reduction techniques have been proposed to address this problem by transforming the text data into much lower dimension, and improving clustering performance. On the other hand, also due to the high dimensional characteristic of text, cosine similarity has been proven to be more suitable than Euclidean distance metric. This suggests modeling text as directional data. In this paper, we propose a novel feature reduction method based on probabilistic mixture model of directional distributions. Empirical results on various benchmark datasets show that our method performs comparably with latent semantic analysis (LSA), and much better than standard methods such as document frequency (DF) and term contribution (TC).
Keywords :
pattern clustering; text analysis; Euclidean distance metric; cosine similarity; directional data; directional distribution mixture model; document frequency; feature reduction method; latent semantic analysis; probabilistic mixture model; term contribution; text clustering; Automatic control; Data engineering; Euclidean distance; Frequency; Performance analysis; Robot control; Robot vision systems; Robotics and automation; Statistical distributions; Vectors; direction statistics; feature reduction; mixture model; text clustering;
Conference_Titel :
Control, Automation, Robotics and Vision, 2008. ICARCV 2008. 10th International Conference on
Conference_Location :
Hanoi
Print_ISBN :
978-1-4244-2286-9
Electronic_ISBN :
978-1-4244-2287-6
DOI :
10.1109/ICARCV.2008.4795874