DocumentCode
3485487
Title
Efficient determinization of tagged word lattices using categorial and lexicographic semirings
Author
Shafran, Izhak ; Sproat, Richard ; Yarmohammadi, Mahsa ; Roark, Brian
Author_Institution
Center for Spoken Language Understanding, USA
fYear
2011
fDate
11-15 Dec. 2011
Firstpage
283
Lastpage
288
Abstract
Speech and language processing systems routinely face the need to apply finite state operations (e.g., POS tagging) on results from intermediate stages (e.g., ASR output) that are naturally represented in a compact lattice form. Currently, such needs are met by converting the lattices into linear sequences (n-best scoring sequences) before and after applying the finite state operations. In this paper, we eliminate the need for this unnecessary conversion by addressing the problem of picking only the single-best scoring output labels for every input sequence. For this purpose, we define a categorial semiring that allows determinzation over strings and incorporate it into a 〈Tropical, Categorial〉 lexicographic semiring. Through examples and empirical evaluations we show how determinization in this lexicographic semiring produces the desired output. The proposed solution is general in nature and can be applied to multi-tape weighted transducers that arise in many applications.
Keywords
natural language processing; sequences; speech recognition; transducers; categorial semiring; language processing; lexicographic semiring; multitape weighted transducer; single best scoring output label; speech processing; tagged word lattice; Acoustics; Complexity theory; Grammar; Lattices; Speech recognition; Tagging; Transducers;
fLanguage
English
Publisher
ieee
Conference_Titel
Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
Conference_Location
Waikoloa, HI
Print_ISBN
978-1-4673-0365-1
Electronic_ISBN
978-1-4673-0366-8
Type
conf
DOI
10.1109/ASRU.2011.6163945
Filename
6163945
Link To Document