DocumentCode :
1857584
Title :
Improved summarization of chinese spoken documents by probabilistic latent semantic analysis (PLSA) with further analysis and integrated scoring
Author :
Sheng-yi Kong ; Lin-shan Lee
Author_Institution :
Speech Lab., Nat. Taiwan Univ., Taipei
fYear :
2006
fDate :
10-13 Dec. 2006
Firstpage :
26
Lastpage :
29
Abstract :
In a previous paper [1] two new scoring measures, topic significance (TS) and topic entropy (TE), obtained from probabilistic latent semantic analysis (PLSA) were shown to outperform very successful baseline significance score (SS) in selecting the important sentences for summarization of spoken documents. In this paper extensive experiments using the ROUGE scores with respect to different parameters at different summarization ratios were carefully analyzed in great detail. It was also found that integration of these two scoring measures offered further improvements, and special considerations of the structure of Chinese language was also helpful when summarizing Chinese spoken documents.
Keywords :
document handling; natural language processing; speech processing; Chinese language; Chinese spoken documents; ROUGE scores; integrated scoring; probabilistic latent semantic analysis; significance score; summarization; topic entropy; topic significance; Differential equations; Educational institutions; Entropy; Error correction; Humans; Information analysis; Natural languages; Speech analysis; Speech recognition; Tellurium;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language Technology Workshop, 2006. IEEE
Conference_Location :
Palm Beach
Print_ISBN :
1-4244-0872-5
Type :
conf
DOI :
10.1109/SLT.2006.326808
Filename :
4123353
Link To Document :
بازگشت