Title :
Improved summarization of chinese spoken documents by probabilistic latent semantic analysis (PLSA) with further analysis and integrated scoring
Author :
Sheng-yi Kong ; Lin-shan Lee
Author_Institution :
Speech Lab., Nat. Taiwan Univ., Taipei
Abstract :
In a previous paper [1] two new scoring measures, topic significance (TS) and topic entropy (TE), obtained from probabilistic latent semantic analysis (PLSA) were shown to outperform very successful baseline significance score (SS) in selecting the important sentences for summarization of spoken documents. In this paper extensive experiments using the ROUGE scores with respect to different parameters at different summarization ratios were carefully analyzed in great detail. It was also found that integration of these two scoring measures offered further improvements, and special considerations of the structure of Chinese language was also helpful when summarizing Chinese spoken documents.
Keywords :
document handling; natural language processing; speech processing; Chinese language; Chinese spoken documents; ROUGE scores; integrated scoring; probabilistic latent semantic analysis; significance score; summarization; topic entropy; topic significance; Differential equations; Educational institutions; Entropy; Error correction; Humans; Information analysis; Natural languages; Speech analysis; Speech recognition; Tellurium;
Conference_Titel :
Spoken Language Technology Workshop, 2006. IEEE
Conference_Location :
Palm Beach
Print_ISBN :
1-4244-0872-5
DOI :
10.1109/SLT.2006.326808