• DocumentCode
    56700
  • Title

    Efficient Skyline Computation on Big Data

  • Author

    Xixian Han ; Jianzhong Li ; Donghua Yang ; Jinbao Wang

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
  • Volume
    25
  • Issue
    11
  • fYear
    2013
  • fDate
    Nov. 2013
  • Firstpage
    2521
  • Lastpage
    2535
  • Abstract
    Skyline is an important operation in many applications to return a set of interesting points from a potentially huge data space. Given a table, the operation finds all tuples that are not dominated by any other tuples. It is found that the existing algorithms cannot process skyline on big data efficiently. This paper presents a novel skyline algorithm SSPL on big data. SSPL utilizes sorted positional index lists which require low space overhead to reduce I/O cost significantly. The sorted positional index list Lj is constructed for each attribute Aj and is arranged in ascending order of Aj. SSPL consists of two phases. In phase 1, SSPL computes scan depth of the involved sorted positional index lists. During retrieving the lists in a round-robin fashion, SSPL performs pruning on any candidate positional index to discard the candidate whose corresponding tuple is not skyline result. Phase 1 ends when there is a candidate positional index seen in all of the involved lists. In phase 2, SSPL exploits the obtained candidate positional indexes to get skyline results by a selective and sequential scan on the table. The experimental results on synthetic and real data sets show that SSPL has a significant advantage over the existing skyline algorithms.
  • Keywords
    algorithm theory; sorting; I/O cost reduction; SSPL; big data; data space; list retrieval; low space overhead; skyline algorithm; skyline computation; sorted positional index list; tuples; Algorithm design and analysis; Data storage systems; Indexes; Information management; Merging; Partitioning algorithms; Big data; SSPL; pruning; skyline;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2012.203
  • Filename
    6331489