DocumentCode :
1962908
Title :
Using Wikipedia as a Reference for Extracting Semantic Information from a Text
Author :
Prato, Andrea ; Ronchetti, Marco
Author_Institution :
Dipt. di Ing. e Scienza dell´´Inf., Univ. di Trento, Povo di Trento, Italy
fYear :
2009
fDate :
11-16 Oct. 2009
Firstpage :
56
Lastpage :
61
Abstract :
In this paper we present an algorithm that, using Wikipedia as a reference, extracts semantic information from an arbitrary text. Our algorithm refines a procedure proposed by others, which mines all the text contained in the whole Wikipedia. Our refinement, based on a clustering approach, exploits the semantic information contained in certain types of Wikipedia hyperlinks, and also introduces an analysis based on multi-words. Our algorithm outperforms current methods in that the output contains many less false positives. We were also able to understand which (structural) part of the texts provides most of the semantic information extracted by the algorithm.
Keywords :
Web sites; data mining; information retrieval; text analysis; Wikipedia; clustering; data mining; multiword analysis; semantic information extraction; text mining; Clustering algorithms; Data mining; Encyclopedias; History; Humans; Information analysis; Logic; Ontologies; Statistics; Wikipedia; Semantic analysis; Wikipedia; clustering; multi-words;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advances in Semantic Processing, 2009. SEMAPRO '09. Third International Conference on
Conference_Location :
Sliema
Print_ISBN :
978-1-4244-5044-2
Electronic_ISBN :
978-0-7695-3833-4
Type :
conf
DOI :
10.1109/SEMAPRO.2009.24
Filename :
5291534
Link To Document :
بازگشت