• DocumentCode
    83851
  • Title

    Spatially Aware Term Selection for Geotagging

  • Author

    Van Laere, Olivier ; Quinn, J. ; Schockaert, S. ; Dhoedt, Bart

  • Author_Institution
    Dept. of Inf. Technol., Ghent Univ., Ghent, Belgium
  • Volume
    26
  • Issue
    1
  • fYear
    2014
  • fDate
    Jan. 2014
  • Firstpage
    221
  • Lastpage
    234
  • Abstract
    The task of assigning geographic coordinates to textual resources plays an increasingly central role in geographic information retrieval. The ability to select those terms from a given collection that are most indicative of geographic location is of key importance in successfully addressing this task. However, this process of selecting spatially relevant terms is at present not well understood, and the majority of current systems are based on standard term selection techniques, such as x2 or information gain, and thus fail to exploit the spatial nature of the domain. In this paper, we propose two classes of term selection techniques based on standard geostatistical methods. First, to implement the idea of spatial smoothing of term occurrences, we investigate the use of kernel density estimation (KDE) to model each term as a two-dimensional probability distribution over the surface of the Earth. The second class of term selection methods we consider is based on Ripley´s K statistic, which measures the deviation of a point set from spatial homogeneity. We provide experimental results which compare these classes of methods against existing baseline techniques on the tasks of assigning coordinates to Flickr photos and to Wikipedia articles, revealing marked improvements in cases where only a relatively small number of terms can be selected.
  • Keywords
    geography; information retrieval; statistical distributions; Earth surface; Flickr photos; KDE; Ripley K statistics; Wikipedia articles; geographic coordinates assignment; geographic information retrieval; geographic location; geotagging; kernel density estimation; spatial homogeneity; spatially aware term selection; standard geostatistical methods; standard term selection techniques; term occurrence spatial smoothing; textual resources; two-dimensional probability distribution; Context; Electronic publishing; Encyclopedias; Estimation; Internet; Standards; Information search and retrieval; artificial intelligence; classification; feature extraction; geographic information retrieval; knowledge management; metadata; semi-structured data; text mining;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2013.42
  • Filename
    6475942