DocumentCode
2659530
Title
Efficient sentence segmentation using syntactic features
Author
Favre, Benoit ; Hakkani-Tür, Dilek ; Petrov, Slav ; Klein, Dan
Author_Institution
Int. Comput. Sci. Inst., Berkeley, CA
fYear
2008
fDate
15-19 Dec. 2008
Firstpage
77
Lastpage
80
Abstract
To enable downstream language processing,automatic speech recognition output must be segmented into its individual sentences. Previous sentence segmentation systems have typically been very local,using low-level prosodic and lexical features to independently decide whether or not to segment at each word boundary position. In this work,we leverage global syntactic information from a syntactic parser, which is better able to capture long distance dependencies. While some previous work has included syntactic features, ours is the first to do so in a tractable, lattice-based way, which is crucial for scaling up to long-sentence contexts. Specifically, an initial hypothesis lattice is constructed using local features. Candidate sentences are then assigned syntactic language model scores. These global syntactic scores are combined with local low-level scores in a log-linear model. The resulting system significantly outperforms the most popular long-span model for sentence segmentation (the hidden event language model) on both reference text and automatic speech recognizer output from news broadcasts.
Keywords
grammars; speech processing; speech recognition; automatic speech recognition; downstream language processing; hypothesis lattice; log-linear model; sentence segmentation; speech processing; syntactic features; syntactic language model scores; syntactic parser; Automatic speech recognition; Broadcasting; Computer science; Context modeling; Contracts; Lattices; Natural language processing; Natural languages; Speech processing; Text recognition; Speech processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Spoken Language Technology Workshop, 2008. SLT 2008. IEEE
Conference_Location
Goa
Print_ISBN
978-1-4244-3471-8
Electronic_ISBN
978-1-4244-3472-5
Type
conf
DOI
10.1109/SLT.2008.4777844
Filename
4777844
Link To Document