• DocumentCode
    243537
  • Title

    Automatic Learning Common Definitional Patterns from Multi-domain Wikipedia Pages

  • Author

    Jingsong Zhang ; Yinglin Wang ; Dingyu Yang

  • Author_Institution
    Dept. of CSE, Shanghai Jiao Tong Univ., Shanghai, China
  • fYear
    2014
  • fDate
    14-14 Dec. 2014
  • Firstpage
    251
  • Lastpage
    258
  • Abstract
    Automatic definition extraction has attracted wide interest in NLP domain and knowledge-based applications. One primary task of definition extraction is mining patterns from definitional sentences. Existing extraction methods of definitional patterns, either focus on manual extraction by intuition or observation, or aim to mine intricate definitional patterns by automatic extraction methods. The manual method requires large human resources to identify the definitional patterns because of diverse lexico-syntactic structures. It inevitable suffers poor behavior especially the extraction from cross-domain corpora. The latter method mainly considers the precision in definition extraction, which is at the cost of decreasing the recall of definitions. Both of them are unsuitable for cross-domain definition extraction. To address those issues, this paper proposes a solution to perform the automatic extraction of definitional patterns from multi-domain definitional sentences of Wikipedia. Our method FIND-SS is modified based on FIND-S algorithm and solves the definition extraction problems of cross-domain corpora. Find-SS adopts a "the more similar the higher priority" scheme to improve the learning performance. It can accommodate some noisy information and does not require any pattern seeds for pattern learning. The experimental results indicate that our scenario is significantly superior to previous method.
  • Keywords
    Web sites; data mining; feature extraction; knowledge based systems; learning (artificial intelligence); natural language processing; NLP; Wikipedia page; automatic definition extraction; cross-domain corpora; knowledge-based application; lexico-syntactic structure; natural language processing; pattern learning; pattern mining; Electronic publishing; Encyclopedias; Internet; Training; Upper bound; Vectors; FIND-S algorithm; definition extraction; definitional pattern; frequent pattern; similarity priority;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshop (ICDMW), 2014 IEEE International Conference on
  • Conference_Location
    Shenzhen
  • Print_ISBN
    978-1-4799-4275-6
  • Type

    conf

  • DOI
    10.1109/ICDMW.2014.107
  • Filename
    7022605