• DocumentCode
    1629869
  • Title

    A Sense Based Similarity Measure for Cross-Lingual Documents

  • Author

    Huang, Hsun-Hui ; Yang, Horng-Chang ; Kuo, Yau-Hwang

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan
  • Volume
    1
  • fYear
    2008
  • Firstpage
    9
  • Lastpage
    13
  • Abstract
    As cross-lingual information retrieval attracts increasing attention, tools that measure cross-lingual document similarity become desirable. Since the way that people convey thoughts at the abstract concept level makes little, if any, difference in the languages they use, it is possible to measure semantic similarity between different lingual documents based on the concepts conveyed by the documents. In this paper, we use senses for document representation to alleviate the barrier of different languages and adopt fuzzy set functions to cope with the inherent fuzziness among senses and propose two document similarity measures- one based on Tversky´s notion on similarity and the other on the much used information retrieval criterion. Their performances are compared experimentally. We only focus on documents in English and Chinese. But the proposed approach can be easily extended to process documents in other languages.
  • Keywords
    fuzzy set theory; information retrieval; natural languages; text analysis; abstract concept level; cross-lingual document similarity; cross-lingual information retrieval; document representation; fuzzy set function; semantic similarity; sense based similarity measure; Application software; Computer science; Design engineering; Fuzzy sets; Information retrieval; Intelligent systems; Internet; Natural language processing; Natural languages; Web pages; cross-lingual; semantic similarity; sense;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems Design and Applications, 2008. ISDA '08. Eighth International Conference on
  • Conference_Location
    Kaohsiung
  • Print_ISBN
    978-0-7695-3382-7
  • Type

    conf

  • DOI
    10.1109/ISDA.2008.284
  • Filename
    4696168