DocumentCode :
3517627
Title :
Latent Dirichlet learning for document summarization
Author :
Chang, Ying-Lang ; Chien, Jen-Tzung
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan
fYear :
2009
fDate :
19-24 April 2009
Firstpage :
1689
Lastpage :
1692
Abstract :
Automatic summarization is developed to extract the representative contents or sentences from a large corpus of documents. This paper presents a new hierarchical representation of words, sentences and documents in a corpus, and infers the Dirichlet distributions for latent topics and latent themes in word level and sentence level, respectively. The sentence-based latent Dirichlet allocation (SLDA) is accordingly established for document summarization. Different from the vector space summarization, SLDA is built to fit the fine structure of text documents, and is specifically designed for sentence selection. SLDA acts as a sentence mixture model with a mixture of Dirichlet themes, which are used to generate the latent topics in observed words. The theme model is inherent to distinguish sentences in a summarization system. In the experiments, the proposed SLDA outperforms other methods for document summarization in terms of precision, recall and F-measure.
Keywords :
learning (artificial intelligence); text analysis; automatic text document summarization; hierarchical sentence representation; hierarchical word representation; latent Dirichlet learning; sentence selection; sentence-based latent Dirichlet allocation; Compaction; Computer science; Convergence; Data mining; Functional analysis; Internet; Linear discriminant analysis; Sampling methods; Speech; Web pages; document summarization; language model; latent Dirichlet allocation; sentence extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
Conference_Location :
Taipei
ISSN :
1520-6149
Print_ISBN :
978-1-4244-2353-8
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2009.4959927
Filename :
4959927
Link To Document :
بازگشت