DocumentCode :
1787405
Title :
Building Distant Supervised Relation Extractors
Author :
Nunes, Thiago ; Schwabe, Daniel
Author_Institution :
Dept. of Inf., Pontifical Catholic Univ. of Rio de Janeiro (PUC-Rio), Rio de Janeiro, Brazil
fYear :
2014
fDate :
16-18 June 2014
Firstpage :
44
Lastpage :
51
Abstract :
A well-known drawback in building machine learning semantic relation detectors for natural language is the lack of a large number of qualified training instances for the target relations in multiple languages. Even when good results are achieved, the datasets used by the state-of-the-art approaches are rarely published. In order to address these problems, this work presents an automatic approach to build multilingual semantic relation detectors through distant supervision combining two of the largest resources of structured and unstructured content available on the Web, DBpedia and Wikipedia. We map the DBpedia ontology back to the Wikipedia text to extract more than 100.000 training instances for more than 90 DBpedia relations for English and Portuguese languages without human intervention. First, we mine the Wikipedia articles to find candidate instances for relations described in the DBpedia ontology. Second, we preprocess and normalize the data filtering out irrelevant instances. Finally, we use the normalized data to construct regularized logistic regression detectors that achieve more than 80% of F-Measure for both English and Portuguese languages. In this paper, we also compare the impact of different types of features on the accuracy of the trained detector, demonstrating significant performance improvements when combining lexical, syntactic and semantic features. Both the datasets and the code used in this research are available online.
Keywords :
Web sites; data mining; natural language processing; ontologies (artificial intelligence); text analysis; DBpedia ontology; DBpedia relations; English languages; F-measure; Portuguese languages; Web; Wikipedia articles; Wikipedia text; data filtering; distant supervised relation extractors; lexical feature; machine learning semantic relation detectors; multilingual semantic relation detectors; natural language; semantic feature; syntactic feature; Electronic publishing; Encyclopedias; Feature extraction; Internet; Ontologies; Semantics; DBpedia; Distant Supervision; Information Extraction; Relation Extraction; Wikipedia;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Semantic Computing (ICSC), 2014 IEEE International Conference on
Conference_Location :
Newport Beach, CA
Print_ISBN :
978-1-4799-4002-8
Type :
conf
DOI :
10.1109/ICSC.2014.15
Filename :
6882000
Link To Document :
بازگشت