DocumentCode :
1122362
Title :
A hybrid SEM algorithm for high-dimensional unsupervised learning using a finite generalized Dirichlet mixture
Author :
Bouguila, Nizar ; Ziou, Djemel
Author_Institution :
Inst. for Inf. Syst. Eng., Concordia Univ., Montreal, Que.
Volume :
15
Issue :
9
fYear :
2006
Firstpage :
2657
Lastpage :
2668
Abstract :
This paper applies a robust statistical scheme to the problem of unsupervised learning of high-dimensional data. We develop, analyze, and apply a new finite mixture model based on a generalization of the Dirichlet distribution. The generalized Dirichlet distribution has a more general covariance structure than the Dirichlet distribution and offers high flexibility and ease of use for the approximation of both symmetric and asymmetric distributions. We show that the mathematical properties of this distribution allow high-dimensional modeling without requiring dimensionality reduction and, thus, without a loss of information. This makes the generalized Dirichlet distribution more practical and useful. We propose a hybrid stochastic expectation maximization algorithm (HSEM) to estimate the parameters of the generalized Dirichlet mixture. The algorithm is called stochastic because it contains a step in which the data elements are assigned randomly to components in order to avoid convergence to a saddle point. The adjective "hybrid" is justified by the introduction of a Newton-Raphson step. Moreover, the HSEM algorithm autonomously selects the number of components by the introduction of an agglomerative term. The performance of our method is tested by the classification of several pattern-recognition data sets. The generalized Dirichlet mixture is also applied to the problems of image restoration, image object recognition and texture image database summarization for efficient retrieval. For the texture image summarization problem, results are reported for the Vistex texture image database from the MIT Media Lab
Keywords :
Newton-Raphson method; expectation-maximisation algorithm; image restoration; image retrieval; image texture; object recognition; pattern classification; statistical distributions; stochastic processes; unsupervised learning; visual databases; MIT Media Lab; Newton-Raphson step; Vistex texture image database; agglomerative term; asymmetric distribution; finite generalized Dirichlet mixture; finite mixture model; general covariance structure; generalized Dirichlet distribution; high-dimensional unsupervised learning; hybrid SEM algorithm; hybrid stochastic expectation maximization algorithm; image object recognition; image restoration; image retrieval; parameter estimation; pattern-recognition data set classification; robust statistical scheme; symmetric distribution; texture image database summarization; Convergence; Data analysis; Image databases; Image restoration; Mathematical model; Parameter estimation; Robustness; Stochastic processes; Testing; Unsupervised learning; Clustering; SEM; Vistex; correlogram; expectation maximization (EM); finite mixture models; generalized Dirichlet; high-dimensional data; hybrid stochastic expectation maximization algorithm (HSEM); image database summarization; image object recognition; image restoration; maximum likelihood (ML);
fLanguage :
English
Journal_Title :
Image Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1057-7149
Type :
jour
DOI :
10.1109/TIP.2006.877379
Filename :
1673446
Link To Document :
بازگشت