Title :
Document Classification with Unsupervised Nonnegative Matrix Factorization and Supervised Percetron Learning
Author :
Barman, Paresh C. ; Lee, Soo-Young
Author_Institution :
Korea Adv. Inst. of Sci. & Technol., Daejeon
Abstract :
A new hybrid neural network model is proposed for the document classification. The NMF-SLP model consists of 2 layers, in which the first non-negative matrix factorization (NMF) layer decomposes a document into several clusters, and the second single-layer-perceptron (SLP) layer classifies the document based on the clusters. The NMF layer is trained by factorizing the document word frequency matrix into feature matrix and coefficient matrix, and then estimating the pseudo-inverse of the feature matrix. The SLP layer is trained by standard error minimization algorithm. Classification performances are investigated as a function of the cluster number, i.e., the number of hidden neurons, and also slope of sigmoidal nonlinearity at the hidden neurons. The developed model demonstrates much better classification accuracy compared to the simple NMF and k-NN classifiers, while standard multi-layer Perceptron is almost impractical to train properly due to high dimensional inputs and large number of adaptive elements.
Keywords :
document handling; matrix decomposition; matrix inversion; minimisation; pattern classification; perceptrons; unsupervised learning; NMF layer; NMF-SLP model; SLP layer; document classification; document word frequency matrix factorization; error minimization algorithm; feature matrix pseudoinverse estimation; hybrid neural network model; single-layer-perceptron layer; supervised perceptron learning; unsupervised nonnegative matrix factorization; Cities and towns; Clustering algorithms; Feature extraction; Frequency estimation; Matrix decomposition; Neural networks; Neurons; Supervised learning; Transmission line matrix methods; Unsupervised learning;
Conference_Titel :
Information Acquisition, 2007. ICIA '07. International Conference on
Conference_Location :
Seogwipo-si
Print_ISBN :
1-4244-1220-X
Electronic_ISBN :
1-4244-1220-X
DOI :
10.1109/ICIA.2007.4295722