DocumentCode :
302295
Title :
Unsupervised topic clustering of switchboard speech messages
Author :
Carlson, Beth A.
Author_Institution :
Lincoln Lab., MIT, Lexington, MA, USA
Volume :
1
fYear :
1996
fDate :
7-10 May 1996
Firstpage :
315
Abstract :
This paper presents a statistical technique which can be used to automatically group speech data records based on the similarity of their content. A tree-based clustering algorithm is used to generate a hierarchical structure for the corpus. This structure can then be used to guide the search for similar material in data from other corpora. The SWITCHBOARD Speech Corpus was used to demonstrate these techniques, since it provides sets of speech files which are nominally on the same topic. Excellent automatic clustering was achieved on the truth text transcripts provided with the SWITCHBOARD corpus, with an average cluster purity of 97.3%. Degraded clustering was achieved using the output transcriptions of a speech recognizer, with a clustering purity of 61.4%
Keywords :
pattern classification; speech recognition; statistical analysis; tree data structures; SWITCHBOARD Speech Corpus; automatic clustering; automatically group; average cluster purity; degraded clustering; hierarchical structure; output transcriptions; speech data records; speech files; speech recognizer; statistical technique; switchboard speech messages; tree-based clustering algorithm; truth text transcripts; unsupervised topic clustering; Automatic speech recognition; Clustering algorithms; Communication switching; Databases; Degradation; Electronic mail; Information retrieval; Laboratories; Speech recognition; Tree data structures;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
Conference_Location :
Atlanta, GA
ISSN :
1520-6149
Print_ISBN :
0-7803-3192-3
Type :
conf
DOI :
10.1109/ICASSP.1996.541095
Filename :
541095
Link To Document :
بازگشت