• DocumentCode
    1260773
  • Title

    Laplacian Eigenmaps for Automatic Story Segmentation of Broadcast News

  • Author

    Xie, Lei ; Zheng, Lilei ; Liu, Zihan ; Zhang, Yanning

  • Author_Institution
    Sch. of Comput. Sci., Northwestern Polytech. Univ., Xi´´an, China
  • Volume
    20
  • Issue
    1
  • fYear
    2012
  • Firstpage
    276
  • Lastpage
    289
  • Abstract
    We propose Laplacian Eigenmaps (LE)-based approaches to automatic story segmentation on speech recognition transcripts of broadcast news. We reinforce story boundaries by applying LE analysis to sentence connective strength matrix and reveal the intrinsic geometric structure of stories. Specifically, we construct a Euclidean space in which each sentence is mapped to a vector. As a result, the original inter-sentence connective strength is reflected by the Euclidean distances between the corresponding vectors and cohesive relations between sentences become geometrically evident. Taking advantage of LE, we present three story segmentation approaches: LE-TextTiling, spectral clustering and LE-DP. In LE-DP, we formalize story segmentation as a straightforward criterion minimization problem and give a fast dynamic programming solution to it. Extensive story segmentation experiments on three corpora demonstrate that the proposed LE-based approaches achieve superior performances and significantly outperform several state-of-the-art methods. For instance, LE-TextTiling obtains a relative F1-measure increase of 17.8% on CCTV Mandarin BN corpus as compared to conventional TextTiling; LE-DP achieves a high F1-measure of 0.7460, which significantly outperforms a recent CRF-prosody approach with an F1-measure of 0.6783 on TDT2 Mandarin BN corpus.
  • Keywords
    Laplace equations; dynamic programming; eigenvalues and eigenfunctions; matrix algebra; minimisation; speech recognition; CCTV Mandarin BN corpus; Euclidean distances; Euclidean space; LE-DP; LE-texttiling; Laplacian eigenmaps; automatic story segmentation; broadcast news; criterion minimization problem; dynamic programming; intersentence connective strength; intrinsic geometric structure; sentence connective strength matrix; spectral clustering; speech recognition transcripts; vector; Laplace equations; Media; Multimedia communication; Robustness; Semantics; Speech recognition; Streaming media; Laplacian Eigenmaps (LE); spoken document retrieval; story segmentation; topic segmentation;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2011.2160853
  • Filename
    5934585