• DocumentCode
    3706751
  • Title

    Spatial Join Query Processing in Cloud: Analyzing Design Choices and Performance Comparisons

  • Author

    Simin You;Jianting Zhang;Le Gruenwald

  • Author_Institution
    Dept. of Comput. Sci., CUNY Grad. Center, New York, NY, USA
  • fYear
    2015
  • Firstpage
    90
  • Lastpage
    97
  • Abstract
    Data volumes of GPS recorded locations and many other types of geospatial data are fast increasing. Processing large-scale spatial joins in Cloud for performance and scalability is becoming increasingly popular. In this study, we compare three leading Cloud-based spatial data management systems, namely Hadoop GIS, Spatial Hadoop and Spatial Spark, both conceptually through analysis of design choices and empirically through experiments using real world datasets. Using both a workstation serving as a single-node cluster and up to 10 nodes Amazon EC2 clusters, the results show that the combined factors, including Cloud platforms, data access models and the underlying geometry libraries, have significant impacts in their realized performance. While Spatial Hadoop generally wins on robustness, Spatial Spark is the clear winner of efficiency due to in-memory processing.
  • Keywords
    "Distributed databases","Spatial databases","Sparks","Cloud computing","Spatial indexes","Geospatial analysis","Geometry"
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing Workshops (ICPPW), 2015 44th International Conference on
  • ISSN
    1530-2016
  • Type

    conf

  • DOI
    10.1109/ICPPW.2015.41
  • Filename
    7349899