• DocumentCode
    3485487
  • Title

    Efficient determinization of tagged word lattices using categorial and lexicographic semirings

  • Author

    Shafran, Izhak ; Sproat, Richard ; Yarmohammadi, Mahsa ; Roark, Brian

  • Author_Institution
    Center for Spoken Language Understanding, USA
  • fYear
    2011
  • fDate
    11-15 Dec. 2011
  • Firstpage
    283
  • Lastpage
    288
  • Abstract
    Speech and language processing systems routinely face the need to apply finite state operations (e.g., POS tagging) on results from intermediate stages (e.g., ASR output) that are naturally represented in a compact lattice form. Currently, such needs are met by converting the lattices into linear sequences (n-best scoring sequences) before and after applying the finite state operations. In this paper, we eliminate the need for this unnecessary conversion by addressing the problem of picking only the single-best scoring output labels for every input sequence. For this purpose, we define a categorial semiring that allows determinzation over strings and incorporate it into a 〈Tropical, Categorial〉 lexicographic semiring. Through examples and empirical evaluations we show how determinization in this lexicographic semiring produces the desired output. The proposed solution is general in nature and can be applied to multi-tape weighted transducers that arise in many applications.
  • Keywords
    natural language processing; sequences; speech recognition; transducers; categorial semiring; language processing; lexicographic semiring; multitape weighted transducer; single best scoring output label; speech processing; tagged word lattice; Acoustics; Complexity theory; Grammar; Lattices; Speech recognition; Tagging; Transducers;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on
  • Conference_Location
    Waikoloa, HI
  • Print_ISBN
    978-1-4673-0365-1
  • Electronic_ISBN
    978-1-4673-0366-8
  • Type

    conf

  • DOI
    10.1109/ASRU.2011.6163945
  • Filename
    6163945