مرکز منطقه ای اطلاع رساني علوم و فناوري - Using Wikipedia as a Reference for Extracting Semantic Information from a Text

DocumentCode :

1962908

Title :

Using Wikipedia as a Reference for Extracting Semantic Information from a Text

Author :

Prato, Andrea ; Ronchetti, Marco

Author_Institution :

Dipt. di Ing. e Scienza dell´´Inf., Univ. di Trento, Povo di Trento, Italy

fYear :

2009

fDate :

11-16 Oct. 2009

Firstpage :

Lastpage :

Abstract :

In this paper we present an algorithm that, using Wikipedia as a reference, extracts semantic information from an arbitrary text. Our algorithm refines a procedure proposed by others, which mines all the text contained in the whole Wikipedia. Our refinement, based on a clustering approach, exploits the semantic information contained in certain types of Wikipedia hyperlinks, and also introduces an analysis based on multi-words. Our algorithm outperforms current methods in that the output contains many less false positives. We were also able to understand which (structural) part of the texts provides most of the semantic information extracted by the algorithm.

Keywords :

Web sites; data mining; information retrieval; text analysis; Wikipedia; clustering; data mining; multiword analysis; semantic information extraction; text mining; Clustering algorithms; Data mining; Encyclopedias; History; Humans; Information analysis; Logic; Ontologies; Statistics; Wikipedia; Semantic analysis; Wikipedia; clustering; multi-words;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Advances in Semantic Processing, 2009. SEMAPRO '09. Third International Conference on

Conference_Location :

Sliema

Print_ISBN :

978-1-4244-5044-2

Electronic_ISBN :

978-0-7695-3833-4

Type :

conf

DOI :

10.1109/SEMAPRO.2009.24

Filename :

5291534

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1962908