Title :
Topic extraction with multiple topic-words in broadcast-news speech
Author :
Ohtsuki, K. ; Matsutoka, T. ; Matsunaga, S. ; Furui, S.
Author_Institution :
NTT Human Interface Labs., Yokosuka, Japan
Abstract :
This paper reports on topic extraction in Japanese broadcast-news speech. We studied, using continuous speech recognition, the extraction of several topic-words from broadcast-news. A combination of multiple topic-words represents the content of the news. This is a more detailed and more flexible approach than using a single word or a single category. A topic extraction model shows the degree of relevance between each topic-word and each word in the article. For all words in an article, topic-words which have high total relevance score are extracted. We trained the topic extraction model with five years of newspapers, using the frequency of topic-words taken from headlines and words in articles. The degree of relevance between topic-words and words in articles is calculated on the basis of statistical measures, i.e., mutual information or the χ2-value. In topic extraction experiments for recognized broadcast-news speech, we extracted five topic-words from the 10-best hypotheses using a χ2-based model and found that 76.6% of them agreed with the topic-words chosen by subjects
Keywords :
broadcasting; feature extraction; information retrieval; information theory; natural languages; speech recognition; statistical analysis; χ2-based model; Japanese broadcast-news speech; articles; continuous speech recognition; experiments; headlines; multiple topic-words; mutual information; news content; newspapers; relevance; statistical measures; topic extraction model; Broadcasting; Content based retrieval; Data mining; Frequency; Hidden Markov models; Indexing; Information retrieval; Mutual information; Natural languages; Speech recognition;
Conference_Titel :
Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7803-4428-6
DOI :
10.1109/ICASSP.1998.674434