• DocumentCode
    2659530
  • Title

    Efficient sentence segmentation using syntactic features

  • Author

    Favre, Benoit ; Hakkani-Tür, Dilek ; Petrov, Slav ; Klein, Dan

  • Author_Institution
    Int. Comput. Sci. Inst., Berkeley, CA
  • fYear
    2008
  • fDate
    15-19 Dec. 2008
  • Firstpage
    77
  • Lastpage
    80
  • Abstract
    To enable downstream language processing,automatic speech recognition output must be segmented into its individual sentences. Previous sentence segmentation systems have typically been very local,using low-level prosodic and lexical features to independently decide whether or not to segment at each word boundary position. In this work,we leverage global syntactic information from a syntactic parser, which is better able to capture long distance dependencies. While some previous work has included syntactic features, ours is the first to do so in a tractable, lattice-based way, which is crucial for scaling up to long-sentence contexts. Specifically, an initial hypothesis lattice is constructed using local features. Candidate sentences are then assigned syntactic language model scores. These global syntactic scores are combined with local low-level scores in a log-linear model. The resulting system significantly outperforms the most popular long-span model for sentence segmentation (the hidden event language model) on both reference text and automatic speech recognizer output from news broadcasts.
  • Keywords
    grammars; speech processing; speech recognition; automatic speech recognition; downstream language processing; hypothesis lattice; log-linear model; sentence segmentation; speech processing; syntactic features; syntactic language model scores; syntactic parser; Automatic speech recognition; Broadcasting; Computer science; Context modeling; Contracts; Lattices; Natural language processing; Natural languages; Speech processing; Text recognition; Speech processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
  • Conference_Location
    Goa
  • Print_ISBN
    978-1-4244-3471-8
  • Electronic_ISBN
    978-1-4244-3472-5
  • Type

    conf

  • DOI
    10.1109/SLT.2008.4777844
  • Filename
    4777844