DocumentCode
3350767
Title
A novel word clustering algorithm based on latent semantic analysis
Author
Bellegarda, Jerome R. ; Butzberger, John W. ; Chow, Yen-Lu ; Coccaro, Noah B. ; Naik, Devang
Author_Institution
Interactive Media Group, Apple Comput. Inc., Cupertino, CA, USA
Volume
1
fYear
1996
fDate
7-10 May 1996
Firstpage
172
Abstract
A new approach is proposed for the clustering of words in a given vocabulary. The method is based on a paradigm first formulated in the context of information retrieval, called latent semantic analysis. This paradigm leads to a parsimonious vector representation of each word in a suitable vector space, where familiar clustering techniques can be applied. The distance measure selected in this space arises naturally from the problem formulation. Preliminary experiments indicate that, the clusters produced are intuitively satisfactory. Because these clusters are semantic in nature, this approach may prove useful as a complement to conventional class-based statistical language modeling techniques
Keywords
natural languages; speech recognition; class-based statistical language modeling techniques; distance measure; latent semantic analysis; parsimonious vector representation; vector space; word clustering algorithm; Algorithm design and analysis; Clustering algorithms; Computer science; Databases; Extraterrestrial measurements; Natural languages; Probability; Speech recognition; Stochastic processes; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
Conference_Location
Atlanta, GA
ISSN
1520-6149
Print_ISBN
0-7803-3192-3
Type
conf
DOI
10.1109/ICASSP.1996.540318
Filename
540318
Link To Document