• DocumentCode
    3246264
  • Title

    Experiments in word-reordering and morphological preprocessing for transducer-based statistical machine translation

  • Author

    De Gispert, Adrià ; Mariño, José B.

  • Author_Institution
    TALP Res. Center, Univ. Politecnica de Catalunya, Barcelona, Spain
  • fYear
    2003
  • fDate
    30 Nov.-3 Dec. 2003
  • Firstpage
    634
  • Lastpage
    639
  • Abstract
    Statistical speech translation can be achieved by an integrated search procedure that produces speech recognition and translation at the same time. Based on finite-state transducers and Viterbi search, the approach forces the rewriting of the target language in a set of modified units (with zero, one or more original words) to preserve the monotonicity of the search over the speech signal without changing the word order in the target language. In this paper, we analyse the effect of non-monotonic word alignments in two different Spanish to English translation tasks (speech-aimed small-vocabulary Verbmobil task and text-aimed large-vocabulary European Parliament task), revealing the most frequent cross patterns and experimenting with reordering strategies to improve the transducer probabilities. In addition, some preliminary results are presented on introducing POS-tagging and lemmatization, as well as some preprocessing such as categorization, to help improving the training of the system.
  • Keywords
    language translation; speech recognition; POS-tagging; Spanish/English translation; Viterbi search; categorization; cross patterns; finite-state transducers; integrated search procedure; integrated speech recognition/translation; lemmatization; morphological preprocessing; nonmonotonic word alignments; search monotonicity; speech-aimed small vocabulary Verbmobil task; text-aimed large-vocabulary European Parliament task; transducer-based statistical machine translation; word-reordering; Acoustic transducers; Equations; Natural languages; Pattern analysis; Speech analysis; Speech processing; Speech recognition; Text recognition; Viterbi algorithm; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on
  • Print_ISBN
    0-7803-7980-2
  • Type

    conf

  • DOI
    10.1109/ASRU.2003.1318514
  • Filename
    1318514