WikiOnto: A System for Semi-automatic Extraction and Modeling of Ontologies Using Wikipedia XML Corpus

Author

De Silva, L.N. ; Jayaratne, Lakshman

Author_Institution

Sch. of Comput., Univ. of Colombo, Colombo, Sri Lanka

fYear

2009

fDate

14-16 Sept. 2009

Firstpage

571

Lastpage

576

Abstract

This paper introduces WikiOnto: a system that assists in the extraction and modeling of topic ontologies in a semi-automatic manner using a preprocessed document corpus of one of the largest knowledge bases in the world - the Wikipedia. Based on the Wikipedia XML Corpus, we present a three-tiered framework for extracting topic ontologies in quick time and a modeling environment to refine these ontologies. Using Natural Language Processing (NLP) and other Machine Learning (ML) techniques along with a very rich document corpus, this system proposes a solution to a task that is generally considered extremely cumbersome. The initial results of the prototype suggest strong potential of the system to become highly successful in ontology extraction and modeling and also inspire further research on extracting ontologies from other semi-structured document corpora as well.

Keywords

XML; learning (artificial intelligence); natural language processing; ontologies (artificial intelligence); text analysis; WikiOnto; Wikipedia XML corpus; knowledge bases; machine learning; natural language processing; semi-automatic ontology extraction; semi-automatic ontology modeling; semi-structured document corpora; Buildings; Data mining; Machine learning; Natural language processing; Ontologies; Prototypes; Relational databases; Semantic Web; Wikipedia; XML; Ontology; Ontology Extraction; Ontology Modeling; Wikipedia XML Corpus;

fLanguage

English

Publisher

ieee

Conference_Titel

Semantic Computing, 2009. ICSC '09. IEEE International Conference on

Conference_Location

Berkeley, CA

Print_ISBN

978-1-4244-4962-0

Electronic_ISBN

978-0-7695-3800-6

Type

conf

DOI

10.1109/ICSC.2009.93

Filename

5298539