• DocumentCode
    2859653
  • Title

    Mining the Web to Create a Language Model for Mapping between English Names and Phrases and Japanese

  • Author

    Grefenstette, Gregory ; Qu, Yan ; Evans, David A.

  • Author_Institution
    LIC2M/LIST/CEA, France
  • fYear
    2004
  • fDate
    20-24 Sept. 2004
  • Firstpage
    110
  • Lastpage
    116
  • Abstract
    The Web provides the largest, exploitable collection of language use. If we can mine the Web to build abstract models of language use, these models may have many applications. Here we present one example of using the implicit intelligence of language use to solve an important problem for machine translation programs and cross-lingual applications. This problem involves the translation of words written in katakana characters in Japanese. In this paper, we describe techniques of discovering katakana transliteration of English names and of finding English translations of multiword katakana sequences using implicit language models of English and Japanese found on the Web. These techniques were evaluated against human-constructed English-katakana glosses.
  • Keywords
    Costs; Data mining; Dictionaries; Gold; Information retrieval; Large-scale systems; Machine intelligence; Natural language processing; Natural languages; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
  • Print_ISBN
    0-7695-2100-2
  • Type

    conf

  • DOI
    10.1109/WI.2004.10042
  • Filename
    1410791