• DocumentCode
    2348230
  • Title

    Designing effective web mining-based techniques for OOV translation

  • Author

    Yu, Haitao ; Ren, Fuji ; Huang, Degen ; Li, Lishuang

  • Author_Institution
    Fac. of Eng., Univ. of Tokushima, Tokushima, Japan
  • fYear
    2010
  • fDate
    21-23 Aug. 2010
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Due to a limited coverage of the existing bilingual dictionary, it is often difficult to translate the Out-Of-Vocabulary terms (OOV) in many natural language processing tasks. In this paper, we propose a general cascade mining technique of three steps, it leverages OOV category to optimize the effectiveness of each step. OOV category based expansion policy is suggested to get more relevant mixed-language documents. OOV category based hybrid extraction approach is suggested to perform a robust extraction. A more flexible model combination based on OOV category is also suggested. Moreover, we conducted experiments to evaluate the effectiveness of each step and the overall performance of the mining technique. The experimental results show significantly performance improvement than the existing methods.
  • Keywords
    Internet; data mining; natural language processing; text analysis; OOV category based expansion policy; OOV translation; Web mining-based techniques; bilingual dictionary; natural language processing; out-of-vocabulary terms; Computational modeling; TV; CLIR; OOV category; Out-of-Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-6896-6
  • Type

    conf

  • DOI
    10.1109/NLPKE.2010.5587807
  • Filename
    5587807