• DocumentCode
    310510
  • Title

    Improved topic discrimination of broadcast news using a model of multiple simultaneous topics

  • Author

    Imai, Toru ; Schwartz, Richard ; Kubala, Francis ; Nguyen, Long

  • Author_Institution
    Sci. & Tech. Res. Lab., NHK (Japan Broadcasting Corp.), Tokyo, Japan
  • Volume
    2
  • fYear
    1997
  • fDate
    21-24 Apr 1997
  • Firstpage
    727
  • Abstract
    This paper presents a new method of topic spotting that attempts to retrieve detailed multiple simultaneous topics from broadcast news stories, each of which has about four different topics out of several thousand different topics. A new topic model uses a simple HMM where each state of the HMM represents one topic and the topic state emits topic-dependent keywords probabilistically. The model allows (unobserved) transitions among topics, word by word. These characteristics improve the discriminative ability between keywords and general words in a topic model and decrease the probabilistic overlap among the topic models more than the conventional topic models (such as a simple multinomial probability model). In addition, the model is not confused by words from multiple topics within one story. We applied the new method to topic spotting from manually transcribed texts of news shows. The new method showed better results in precision and recall rates than the conventional method
  • Keywords
    computational linguistics; hidden Markov models; probability; HMM; broadcast news; discriminative ability; general words; keywords; manually transcribed text; multiple simultaneous topics; news stories; probabilistic overlap; topic discrimination; topic-dependent keywords; Broadcast technology; Broadcasting; CD-ROMs; Hidden Markov models; Information retrieval; Robustness; State estimation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on
  • Conference_Location
    Munich
  • ISSN
    1520-6149
  • Print_ISBN
    0-8186-7919-0
  • Type

    conf

  • DOI
    10.1109/ICASSP.1997.596011
  • Filename
    596011