• DocumentCode
    2547955
  • Title

    Experiment analysis in newspaper topic detection

  • Author

    Brun, Annelle ; Smaili, Kamel ; Haton, Jean-Paul

  • Author_Institution
    LORIA INRIA-Lorraine, Vandoeuvre-les-Nancy, France
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    55
  • Lastpage
    64
  • Abstract
    We present several methods for topic detection on newspaper articles, using either a general vocabulary or topic-specific vocabularies. Specific vocabularies are determined manually or statistically. In both cases, we aim at finding the most representative words of a topic. Several methods have been experimented, the first one is based on perplexity, this method achieves a 100% topic identification rate, on large test corpora, when the two first propositions are taken into account. Other methods are based on statistical counts and achieve 94% of identification on smaller test corpora. The major challenge of this work is to identify topics with only few words in order to be able, during speech recognition, to determine the best adequate language model
  • Keywords
    natural languages; speech recognition; vocabulary; experiment analysis; language model; large test corpora; newspaper topic detection; perplexity; representative words; speech recognition; statistical counts; vocabulary; Acoustic testing; Automatic speech recognition; Character recognition; History; Natural languages; Predictive models; Speech recognition; Stochastic processes; Text recognition; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on
  • Conference_Location
    A Curuna
  • Print_ISBN
    0-7695-0746-8
  • Type

    conf

  • DOI
    10.1109/SPIRE.2000.878180
  • Filename
    878180