DocumentCode :
2053906
Title :
Extracting Geospatial Entities from Wikipedia
Author :
Witmer, Jeremy ; Kalita, Jugal
Author_Institution :
Colorado Springs Comput. Sci. Dept., Univ. of Colorado, Colorado Springs, CO, USA
fYear :
2009
fDate :
14-16 Sept. 2009
Firstpage :
450
Lastpage :
457
Abstract :
This paper addresses the challenge of extracting geospatial data from the article text of the English Wikipedia. In the first phase of our work, we create a training corpus and select a set of word-based features to train a Support Vector Machine (SVM) for the task of geospatial named entity recognition. We target for testing a corpus of Wikipedia articles about battles and wars, as these have a high incidence of geospatial content. The SVM recognizes place names in the corpus with a very high recall, close to 100%, with an acceptable precision. The set of geospatial NEs is then fed into a geocoding and resolution process, whose goal is to determine the correct coordinates for each place name. As many place names are ambiguous, and do not immediately geocode to a single location, we present a data structure and algorithm to resolve ambiguity based on sentence and article context, so the correct coordinates can be selected. We achieve an f-measure of 82%, and create a set of geospatial entities for each article, combining the place names, spatial locations, and an assumed point geometry. These entities can enable geospatial search on and geovisualization of Wikipedia.
Keywords :
Web sites; geographic information systems; natural language processing; Wikipedia; assumed point geometry; geospatial data; geospatial entities; geospatial named entity recognition; geospatial search; geovisualization; place names; spatial location; support vector machine; training corpus; word-based features; Computer science; Data mining; Databases; Hidden Markov models; Internet; Open source software; Springs; Support vector machine classification; Support vector machines; Wikipedia; NER; Wikipedia extraction; geospatial entity recognition; geospatial extraction; location extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Semantic Computing, 2009. ICSC '09. IEEE International Conference on
Conference_Location :
Berkeley, CA
Print_ISBN :
978-1-4244-4962-0
Electronic_ISBN :
978-0-7695-3800-6
Type :
conf
DOI :
10.1109/ICSC.2009.62
Filename :
5298641
Link To Document :
بازگشت