Title :
Using Wikipedia as a Reference for Extracting Semantic Information from a Text
Author :
Prato, Andrea ; Ronchetti, Marco
Author_Institution :
Dipt. di Ing. e Scienza dell´´Inf., Univ. di Trento, Povo di Trento, Italy
Abstract :
In this paper we present an algorithm that, using Wikipedia as a reference, extracts semantic information from an arbitrary text. Our algorithm refines a procedure proposed by others, which mines all the text contained in the whole Wikipedia. Our refinement, based on a clustering approach, exploits the semantic information contained in certain types of Wikipedia hyperlinks, and also introduces an analysis based on multi-words. Our algorithm outperforms current methods in that the output contains many less false positives. We were also able to understand which (structural) part of the texts provides most of the semantic information extracted by the algorithm.
Keywords :
Web sites; data mining; information retrieval; text analysis; Wikipedia; clustering; data mining; multiword analysis; semantic information extraction; text mining; Clustering algorithms; Data mining; Encyclopedias; History; Humans; Information analysis; Logic; Ontologies; Statistics; Wikipedia; Semantic analysis; Wikipedia; clustering; multi-words;
Conference_Titel :
Advances in Semantic Processing, 2009. SEMAPRO '09. Third International Conference on
Conference_Location :
Sliema
Print_ISBN :
978-1-4244-5044-2
Electronic_ISBN :
978-0-7695-3833-4
DOI :
10.1109/SEMAPRO.2009.24