DocumentCode
1695970
Title
Statistical machine translation based text normalization with crowdsourcing
Author
Schlippe, Tim ; Chenfei Zhu ; Lemcke, Daniel ; Schultz, Tanja
Author_Institution
Cognitive Syst. Lab., Karlsruhe Inst. of Technol. (KIT), Karlsruhe, Germany
fYear
2013
Firstpage
8406
Lastpage
8410
Abstract
In [1], we have proposed systems for text normalization based on statistical machine translation (SMT) methods which are constructed with the support of Internet users and evaluated those with French texts. Internet users normalize text displayed in a web interface in an annotation process, thereby providing a parallel corpus of normalized and non-normalized text. With this corpus, SMT models are generated to translate non-normalized into normalized text. In this paper, we analyze their efficiency for other languages. Additionally, we embedded the English annotation process for training data in Amazon Mechanical Turk and compare the quality of texts thoroughly annotated in our lab to those annotated by the Turkers. Finally, we investigate how to reduce the user effort by iteratively applying an SMT system to the next sentences to be edited, built from the sentences which have been annotated so far.
Keywords
language translation; natural languages; statistical analysis; text analysis; Amazon Mechanical Turk; English annotation process; French texts; Internet users support; crowdsourcing; nonnormalized text; normalized text; parallel corpus; statistical machine translation; text normalization; training data; Computational modeling; Conferences; Internet; Noise measurement; Speech; Training; Training data; crowdsourcing; rapid language adaptation; statistical machine translation; text normalization;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location
Vancouver, BC
ISSN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2013.6639305
Filename
6639305
Link To Document