DocumentCode :
3269851
Title :
An outlier-aware data clustering algorithm in mixture models
Author :
Thang, Nguyen Duc ; Chen Lihui ; Keong, Chan Chee
Author_Institution :
Sch. of Electr. & Electron. Eng., Nanyang Technol. Univ., Singapore, Singapore
fYear :
2009
fDate :
8-10 Dec. 2009
Firstpage :
1
Lastpage :
5
Abstract :
A robust mixture model-based clustering algorithm using genetic techniques is proposed in this paper. In many engineering and application domains, noisy samples and outliers often exist in data collections, causing negative effects on performance of data mining methods if they are not made aware of these elements. Classical probabilistic mixture-based clustering is one known to be very sensitive to such situation. To improve its performance, we combine Genetic Algorithm (GA) with the expectation-maximization (EM) procedure of the classical model. When trimmed likelihood is used as fitness function of GA, high representative samples are selected and potential outliers are pruned off effectively during the learning process. Experiments on both synthetic and real data for different applications show that our approach outperforms the classical mixture model, by producing more accurate and reliable results.
Keywords :
data analysis; data mining; expectation-maximisation algorithm; genetic algorithms; pattern clustering; probability; data collection; data mining; expectation-maximization procedure; genetic algorithm; outlier-aware data clustering algorithm; potential outliers; probabilistic mixture based clustering; robust mixture model; Clustering algorithms; Data analysis; Data engineering; Data mining; Distortion measurement; Genetic algorithms; Genetic engineering; Maximum likelihood estimation; Parameter estimation; Robustness; Robust clustering; genetic algorithm; mixture model; outliers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information, Communications and Signal Processing, 2009. ICICS 2009. 7th International Conference on
Conference_Location :
Macau
Print_ISBN :
978-1-4244-4656-8
Electronic_ISBN :
978-1-4244-4657-5
Type :
conf
DOI :
10.1109/ICICS.2009.5397571
Filename :
5397571
Link To Document :
بازگشت