• DocumentCode
    2398066
  • Title

    ASR Normalization for Machine Translation

  • Author

    Huang, Heyan ; Feng, Chong ; Wang, Jiande ; Zhang, Xiaofei

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Beijing Inst. of Technol., Beijing, China
  • Volume
    2
  • fYear
    2010
  • fDate
    26-28 Aug. 2010
  • Firstpage
    91
  • Lastpage
    94
  • Abstract
    In natural spoken language there are many meaningless modal particles and dittographes, furthermore ASR (automatic speech recognition) often has some recognition errors and the ASR results have no punctuations. Therefore, the translation would be rather poor if the ASR results are directly translated by MT (machine translation). In this paper, an ASR normalization approach was introduced for machine translation which based on maximum entropy sequential labeling model. Before translation, the meaningless modal particles and dittograph were deleted, and the recognition errors were corrected, and ASR results were also punctuated. Experiments show that the MT BLEU of 0.2465 is obtained, that improved by 17.3% over the MT baseline without normalization. The positive experimental results confirm that ASR normalization is effective for improvement of translation quality for spoken language machine translation.
  • Keywords
    language translation; maximum entropy methods; speech recognition; ASR normalization approach; automatic speech recognition; maximum entropy sequential labeling model; spoken language machine translation; translation quality; Acoustics; Computational modeling; Decoding; Entropy; Labeling; Natural language processing; Speech recognition; Spoken language; automatic speech recognition; machine translation; maximum entropy model; normalization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2010 2nd International Conference on
  • Conference_Location
    Nanjing, Jiangsu
  • Print_ISBN
    978-1-4244-7869-9
  • Type

    conf

  • DOI
    10.1109/IHMSC.2010.122
  • Filename
    5590723