DocumentCode :
2018075
Title :
Adaptive segment model for spoken document retrieval
Author :
Chueh, Chuang-Hua ; Chien, Jen-Tzung
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
fYear :
2010
fDate :
Nov. 29 2010-Dec. 3 2010
Firstpage :
261
Lastpage :
264
Abstract :
In a robust information retrieval system, the documents should be represented by considering the variations of word distributions in different paragraphs or segments. A nonstationary latent Dirichlet allocation (NLDA) was established by incorporating a Markov chain to detect the stylistic segments in a heterogeneous document. Each segment corresponds to a particular style and is generated by different word distributions. However, such NLDA is constrained by a fixed number of segments for different lengths of documents. This paper presents a new adaptive segment model (ASM) by adaptively building the topic-based document model with different segment numbers. By incorporating a multinomial hidden variable with Dirichlet prior, the inference procedure of ASM parameters is built through a variational Bayes EM algorithm. In the experiments, the proposed ASM is evaluated for spoken document retrieval using TDT2 corpus. ASM achieves better performance than LDA and NLDA.
Keywords :
computational linguistics; hidden Markov models; information retrieval; speech recognition; Dirichlet prior; Markov chain; TDT2 corpus; adaptive segment model; information retrieval system; nonstationary latent Dirichlet allocation; paragraphs; spoken document retrieval; topic-based document model; variational Bayes EM algorithm; word distributions; Adaptation model; Biological system modeling; Hidden Markov models; Markov processes; Probability; Speech recognition; Viterbi algorithm; segment model; spoken document retrieval; topic model; variational Bayes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Chinese Spoken Language Processing (ISCSLP), 2010 7th International Symposium on
Conference_Location :
Tainan
Print_ISBN :
978-1-4244-6244-5
Type :
conf
DOI :
10.1109/ISCSLP.2010.5684896
Filename :
5684896
Link To Document :
بازگشت