Title :
Machine Translation Approach for Vietnamese Diacritic Restoration
Author :
Thi Ngoc Diep Do ; Duy Binh Nguyen ; Dang Khoa Mae ; Do Dat Tran
Author_Institution :
MICA Inst., Hanoi Univ. of Sci. & Technol., Hanoi, Vietnam
Abstract :
The diacritic marks exist in many languages such as French, German, Slovak, Vietnamese, etc. However for some reasons, sometime they are omitted in writing. This phenomenon may lead to the ambiguity for reader when reading a non-diacritic text. The automatic diacritic restoration problem has been proposed and resolved in several languages using the character-based approach, word-based approach, point-wise approach, etc. However, these approaches lean heavily on the linguistics information, size of training corpus and sometime they are language dependent. In this paper, a simple and effective restoration method will be presented. The machine translation approach will be used as a new solution for this problem. The restoration method has been applied for Vietnamese language, and integrated in an Android application named VIVA (Vietnamese Voice Assistant) that reads out the content of incoming text messages on mobile phone. Our experiments show that the proposed restoration method can recover diacritic marks with a 99.0% accuracy rate.
Keywords :
language translation; natural language processing; Android application; VIVA; Vietnamese Voice Assistant; Vietnamese diacritic restoration; Vietnamese language; automatic diacritic restoration problem; diacritic mark; linguistics information; machine translation; Accuracy; Smart phones; Speech; Training; Training data; Writing; diacritics restoration; statistical machine translation; text message; vietnamese;
Conference_Titel :
Asian Language Processing (IALP), 2013 International Conference on
Conference_Location :
Urumqi
DOI :
10.1109/IALP.2013.30