• DocumentCode
    2350137
  • Title

    Curate a transliteration corpus from transliteration/translation pairs

  • Author

    Wu, Shih-Hung ; Li, Yu-Te

  • Author_Institution
    Department of Computer Science & Information Engineering, Chaoyang University of Technology, Taiwan
  • fYear
    2008
  • fDate
    13-15 July 2008
  • Firstpage
    208
  • Lastpage
    213
  • Abstract
    Transliteration of new named entity is important for information retrieval that crosses two or multiple language. Rule-based machine transliteration is not satisfactory, since different information sources have different standards for the transliteration. To build a statistic machine transliteration module, researchers have to curate a transliteration corpus for any given two languages of interest. Since a large amount of transliteration/translation pairs can be collected from the Web, a large transliteration-training corpus can be curated from these pairs. In this paper, we proposed a bi-directional approach to classify transliteration/translation pairs. Our approach combines both forward transliteration and backward transliteration to classify transliteration from translation. An experiment on English and Chinese transliteration is conducted.
  • Keywords
    Bidirectional control; Chaos; Computer science; Dictionaries; Information retrieval; Natural languages; Probability; Statistics; USA Councils; Unsupervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration, 2008. IRI 2008. IEEE International Conference on
  • Conference_Location
    Las Vegas, NV, USA
  • Print_ISBN
    978-1-4244-2659-1
  • Electronic_ISBN
    978-1-4244-2660-7
  • Type

    conf

  • DOI
    10.1109/IRI.2008.4583031
  • Filename
    4583031