DocumentCode
243537
Title
Automatic Learning Common Definitional Patterns from Multi-domain Wikipedia Pages
Author
Jingsong Zhang ; Yinglin Wang ; Dingyu Yang
Author_Institution
Dept. of CSE, Shanghai Jiao Tong Univ., Shanghai, China
fYear
2014
fDate
14-14 Dec. 2014
Firstpage
251
Lastpage
258
Abstract
Automatic definition extraction has attracted wide interest in NLP domain and knowledge-based applications. One primary task of definition extraction is mining patterns from definitional sentences. Existing extraction methods of definitional patterns, either focus on manual extraction by intuition or observation, or aim to mine intricate definitional patterns by automatic extraction methods. The manual method requires large human resources to identify the definitional patterns because of diverse lexico-syntactic structures. It inevitable suffers poor behavior especially the extraction from cross-domain corpora. The latter method mainly considers the precision in definition extraction, which is at the cost of decreasing the recall of definitions. Both of them are unsuitable for cross-domain definition extraction. To address those issues, this paper proposes a solution to perform the automatic extraction of definitional patterns from multi-domain definitional sentences of Wikipedia. Our method FIND-SS is modified based on FIND-S algorithm and solves the definition extraction problems of cross-domain corpora. Find-SS adopts a "the more similar the higher priority" scheme to improve the learning performance. It can accommodate some noisy information and does not require any pattern seeds for pattern learning. The experimental results indicate that our scenario is significantly superior to previous method.
Keywords
Web sites; data mining; feature extraction; knowledge based systems; learning (artificial intelligence); natural language processing; NLP; Wikipedia page; automatic definition extraction; cross-domain corpora; knowledge-based application; lexico-syntactic structure; natural language processing; pattern learning; pattern mining; Electronic publishing; Encyclopedias; Internet; Training; Upper bound; Vectors; FIND-S algorithm; definition extraction; definitional pattern; frequent pattern; similarity priority;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining Workshop (ICDMW), 2014 IEEE International Conference on
Conference_Location
Shenzhen
Print_ISBN
978-1-4799-4275-6
Type
conf
DOI
10.1109/ICDMW.2014.107
Filename
7022605
Link To Document