Extracting Domain-Relevant Term Using Wikipedia Based on Random Walk Model

Author

Wu, Wenjuan ; Liu, Tao ; Hu, He ; Du, Xiaoyong

Author_Institution

Key Labs. of Data Eng. & Knowledge Eng., China

fYear

2012

fDate

20-23 Sept. 2012

Firstpage

68

Lastpage

75

Abstract

In this paper we present a new approach for the automatic identification of domain-relevant concepts and entities of a given domain using the category and page structures of the Wikipedia in a language independent way. By applying Markov random walk algorithm on the weighted Wikipedia link graph, our approach can identify large quantities of domain-relevant concepts and entities with very little human effort. Experimental results show that our method achieves high accuracy and acceptable efficiency in domain-relevant term extraction.

Keywords

Markov processes; Web sites; graph theory; information retrieval; Markov random walk algorithm; domain-relevant concept automatic identification; domain-relevant entity automatic identification; domain-relevant term extraction; weighted Wikipedia link graph; Biological system modeling; Electronic publishing; Encyclopedias; Internet; Ontologies; Semantics; Domain-relevant Concepts; Link Graph; Markov Chain; Random Walk; Wikipedia;

fLanguage

English

Publisher

ieee

Conference_Titel

ChinaGrid Annual Conference (ChinaGrid), 2012 Seventh

Conference_Location

Beijing

Print_ISBN

978-1-4673-2623-0

Electronic_ISBN

978-0-7695-4816-6

Type

conf

DOI

10.1109/ChinaGrid.2012.20

Filename

6337278