• DocumentCode
    2632850
  • Title

    A Fuzzy-Rough Set Based Semantic Similarity Measure Between Cross-Lingual Documents

  • Author

    Huang, Hsun-Hui ; Yang, Horng-Chang ; Kuo, Yau-Hwang

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ. Tainan, Tainan
  • fYear
    2008
  • fDate
    18-20 June 2008
  • Firstpage
    82
  • Lastpage
    82
  • Abstract
    As cross-lingual information retrieval attracts increasing attention, tools that measure cross-lingual document similarity become desirable. Since the way that people convey thoughts at the abstract concept level makes little, if any, difference in the languages they use, it is possible to measure semantic similarity between different lingual documents based on the concepts conveyed by the documents. In this paper, a novel fuzzy rough set based method for measurement of semantic similarity between cross lingual (Chinese and English) documents is proposed. Aided by a bilingual dictionary and Wordnet, translation is processed like word sense disambiguation and all the distilled senses are used to construct a fuzzy approximation space using a fuzzy partition algorithm. In the fuzzy approximation space documents are approximated by their fuzzy upper and lower approximations and the similarity measure is defined accordingly. The upper and lower approximations correspond to the slack and tight extent of the concepts in their associated document. This method makes possible to distinguish among the documents whose original texts seem not similar but conveyed concepts are similar.
  • Keywords
    fuzzy set theory; information retrieval; natural language processing; rough set theory; Wordnet; bilingual dictionary; cross-lingual document similarity; cross-lingual information retrieval; fuzzy approximation space documents; fuzzy partition algorithm; fuzzy-rough set; semantic similarity; Approximation algorithms; Computer science; Dictionaries; Extraterrestrial measurements; Fuzzy sets; Information retrieval; Keyword search; Natural languages; Partitioning algorithms; Solid modeling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Innovative Computing Information and Control, 2008. ICICIC '08. 3rd International Conference on
  • Conference_Location
    Dalian, Liaoning
  • Print_ISBN
    978-0-7695-3161-8
  • Electronic_ISBN
    978-0-7695-3161-8
  • Type

    conf

  • DOI
    10.1109/ICICIC.2008.33
  • Filename
    4603271