• DocumentCode
    389573
  • Title

    Text mining of multilingual corpora via computing semantic relatedness

  • Author

    Lee, Chung-Hong ; Yang, Hsin-Chang

  • Author_Institution
    Dept. of Inf. Manage., Chang Jung Univ., Tainan, Taiwan
  • Volume
    5
  • fYear
    2002
  • fDate
    6-9 Oct. 2002
  • Abstract
    This paper describes a new application of a text-mining algorithm to the text sources of bilingual corpora. In the past, the majority of the approaches applied to measuring semantic relatedness was based on edge counting methods through a semantic network, such as WordNet. It is not well suited for applications in specific domains in which the standard lexical knowledge bases are not available. In this work, we propose an alternative solution for acquisition of semantic relatedness from text corpora by means of a machine learning technique, namely the self-organizing maps. This paper presents a hybrid approach to discovering a concept-based feature map containing word clusters and document clusters from multilingual text collections. Using SOM-based automatic clustering techniques, we have conducted several experiments to uncover associated documents based on Chinese-English bilingual parallel corpora, and a hybrid Chinese-English corpus. In essence, this work provides a method for automatic text clustering, which resolves some of the language difficulties in concept discovery and categorization from multilingual text corpora.
  • Keywords
    classification; data mining; learning (artificial intelligence); self-organising feature maps; text analysis; Chinese-English bilingual parallel corpora; automatic clustering techniques; bilingual corpora; concept discovery; document clusters; edge counting methods; experiments; lexical knowledge bases; machine learning technique; multilingual corpora; self-organizing maps; semantic network; semantic relatedness; text mining; word clusters; Data mining; History; Information management; Information systems; Machine learning; Measurement standards; Natural languages; Text categorization; Text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man and Cybernetics, 2002 IEEE International Conference on
  • ISSN
    1062-922X
  • Print_ISBN
    0-7803-7437-1
  • Type

    conf

  • DOI
    10.1109/ICSMC.2002.1176326
  • Filename
    1176326