• DocumentCode
    1695970
  • Title

    Statistical machine translation based text normalization with crowdsourcing

  • Author

    Schlippe, Tim ; Chenfei Zhu ; Lemcke, Daniel ; Schultz, Tanja

  • Author_Institution
    Cognitive Syst. Lab., Karlsruhe Inst. of Technol. (KIT), Karlsruhe, Germany
  • fYear
    2013
  • Firstpage
    8406
  • Lastpage
    8410
  • Abstract
    In [1], we have proposed systems for text normalization based on statistical machine translation (SMT) methods which are constructed with the support of Internet users and evaluated those with French texts. Internet users normalize text displayed in a web interface in an annotation process, thereby providing a parallel corpus of normalized and non-normalized text. With this corpus, SMT models are generated to translate non-normalized into normalized text. In this paper, we analyze their efficiency for other languages. Additionally, we embedded the English annotation process for training data in Amazon Mechanical Turk and compare the quality of texts thoroughly annotated in our lab to those annotated by the Turkers. Finally, we investigate how to reduce the user effort by iteratively applying an SMT system to the next sentences to be edited, built from the sentences which have been annotated so far.
  • Keywords
    language translation; natural languages; statistical analysis; text analysis; Amazon Mechanical Turk; English annotation process; French texts; Internet users support; crowdsourcing; nonnormalized text; normalized text; parallel corpus; statistical machine translation; text normalization; training data; Computational modeling; Conferences; Internet; Noise measurement; Speech; Training; Training data; crowdsourcing; rapid language adaptation; statistical machine translation; text normalization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
  • Conference_Location
    Vancouver, BC
  • ISSN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2013.6639305
  • Filename
    6639305