• DocumentCode
    2018075
  • Title

    Adaptive segment model for spoken document retrieval

  • Author

    Chueh, Chuang-Hua ; Chien, Jen-Tzung

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
  • fYear
    2010
  • fDate
    Nov. 29 2010-Dec. 3 2010
  • Firstpage
    261
  • Lastpage
    264
  • Abstract
    In a robust information retrieval system, the documents should be represented by considering the variations of word distributions in different paragraphs or segments. A nonstationary latent Dirichlet allocation (NLDA) was established by incorporating a Markov chain to detect the stylistic segments in a heterogeneous document. Each segment corresponds to a particular style and is generated by different word distributions. However, such NLDA is constrained by a fixed number of segments for different lengths of documents. This paper presents a new adaptive segment model (ASM) by adaptively building the topic-based document model with different segment numbers. By incorporating a multinomial hidden variable with Dirichlet prior, the inference procedure of ASM parameters is built through a variational Bayes EM algorithm. In the experiments, the proposed ASM is evaluated for spoken document retrieval using TDT2 corpus. ASM achieves better performance than LDA and NLDA.
  • Keywords
    computational linguistics; hidden Markov models; information retrieval; speech recognition; Dirichlet prior; Markov chain; TDT2 corpus; adaptive segment model; information retrieval system; nonstationary latent Dirichlet allocation; paragraphs; spoken document retrieval; topic-based document model; variational Bayes EM algorithm; word distributions; Adaptation model; Biological system modeling; Hidden Markov models; Markov processes; Probability; Speech recognition; Viterbi algorithm; segment model; spoken document retrieval; topic model; variational Bayes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
  • Conference_Location
    Tainan
  • Print_ISBN
    978-1-4244-6244-5
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2010.5684896
  • Filename
    5684896