• DocumentCode
    1981081
  • Title

    Efficient Similarity Joins on Massive High-Dimensional Datasets Using MapReduce

  • Author

    Luo, Wuman ; Tan, Haoyu ; Mao, Huajian ; Ni, Lionel M.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Hong Kong Univ. of Sci. & Technol., Hong Kong, China
  • fYear
    2012
  • fDate
    23-26 July 2012
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    High-dimensional similarity join (HDSJ) is critical for many novel applications in the domain of mobile data management. Nowadays, performing HDSJs efficiently faces two challenges. First, the scale of datasets is increasing rapidly, making parallel computing on a scalable platform a must. Second, the dimensionality of the data can be up to hundreds or even thousands, which brings about the issue of dimensionality curse. In this paper, we address these challenges and study how to perform parallel HDSJs efficiently in the MapReduce paradigm. Particularly, we propose a cost model to demonstrate that it is important to take both communication and computation costs into account as dimensionality and data volume increases. To this end, we propose DAA (Dimension Aggregation Approximation), an efficient compression approach that can help significantly reduce both these costs when performing parallel HDSJs. Moreover, we design DAA-based parallel HDSJ algorithms which can scale up to massive data sizes and very high dimensionality. We perform extensive experiments using both synthetic and real datasets to evaluate the speedup and the scale up of our algorithms.
  • Keywords
    data compression; mobile computing; parallel processing; DAA; HDSJ; MapReduce; compression approach; dimension aggregation approximation; high-dimensional datasets; high-dimensional similarity join; mobile data management; parallel computing; scalable platform; Algorithm design and analysis; Approximation algorithms; Approximation methods; Computational modeling; Data models; Time series analysis; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Mobile Data Management (MDM), 2012 IEEE 13th International Conference on
  • Conference_Location
    Bengaluru, Karnataka
  • Print_ISBN
    978-1-4673-1796-2
  • Electronic_ISBN
    978-0-7695-4713-8
  • Type

    conf

  • DOI
    10.1109/MDM.2012.25
  • Filename
    6341368