Title :
Spectral approach to find number of clusters of short-text documents
Author :
Goyal, Ankur ; Jadon, Mukesh K. ; Pujari, Arun K.
Author_Institution :
LNM Inst. of Inf. Technol., Jaipur, India
Abstract :
We propose a technique of determining the number of clusters of a corpus of short-text documents. A spectral algorithm suitable for short-texts is used to generate an ensemble. A Markov chain induced by the co-association matrix is studied to observe nearly uncoupling phenomenon over iterations. A large spectral gap and number of eigenvectors close to 1 indicate the number of clusters. We demonstrate by experimenting on several datasets.
Keywords :
Markov processes; eigenvalues and eigenfunctions; learning (artificial intelligence); matrix algebra; pattern clustering; text analysis; Markov chain; cluster number determination; coassociation matrix; eigenvectors; ensemble learning; short-text documents; spectral approach; spectral gap; Clustering algorithms; Data mining; Eigenvalues and eigenfunctions; Electronic mail; Feature extraction; Markov processes; Visualization; number of clusters; short-texts; spectral method; term-weighting; uncoupling;
Conference_Titel :
Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), 2013 Fourth National Conference on
Conference_Location :
Jodhpur
Print_ISBN :
978-1-4799-1586-6
DOI :
10.1109/NCVPRIPG.2013.6776152