Title :
Broadcast news story segmentation using latent topics on data manifold
Author :
Xiaoming Lu ; Cheung-Chi Leung ; Lei Xie ; Bin Ma ; Haizhou Li
Author_Institution :
Shaanxi Provincial Key Lab. of Speech & Image Inf. Process., Northwestern Polytech. Univ., Xi´an, China
Abstract :
This paper proposes to use Laplacian Probabilistic Latent Semantic Analysis (LapPLSA) for broadcast news story segmentation. The latent topic distributions estimated by LapPLSA are used to replace term frequency vector as the representation of sentences and measure the cohesive strength between the sentences. Subword n-gram is used as the basic term unit in the computation. Dynamic Programming is used for story boundary detection. LapPLSA projects the data into a low-dimensional semantic topic representation while preserving the intrinsic local geometric structure of the data. The locality preserving property attempts to make the estimated latent topic distributions more robust to the noise from automatic speech recognition errors. Experiments are conducted on the ASR transcripts of TDT2 Mandarin broadcast news corpus. Our proposed approach is compared with other approaches which use dimensionality reduction technique with the locality preserving property, and two different topic modeling techniques. Experiment results show that our proposed approach provides the highest F1-measure of 0.8228, which significantly outperforms the best previous approaches.
Keywords :
broadcasting; dynamic programming; speech recognition; ASR transcripts; F1-measure; LapPLSA; Laplacian probabilistic latent semantic analysis; TDT2 Mandarin broadcast news corpus; automatic speech recognition errors; broadcast news story segmentation; cohesive strength; data manifold; dimensionality reduction technique; dynamic programming; intrinsic local geometric structure; locality preserving property; low-dimensional semantic topic representation; story boundary detection; subword n-gram; topic modeling techniques; Frequency measurement; Laplace equations; Manifolds; Probabilistic logic; Semantics; Symmetric matrices; Training data; dimensionality reduction; laplacian probabilistic latent semantic analysis; story segmentation; topic modeling;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6639317