• DocumentCode
    3739209
  • Title

    Mining Massive Vector Data on Single Instruction Multiple Data Microarchitectures

  • Author

    B?hm;Claudia Plant

  • Author_Institution
    Univ. of Munich, Munich, Germany
  • fYear
    2015
  • Firstpage
    597
  • Lastpage
    606
  • Abstract
    Current microarchitectures are equipped with SIMD instruction sets enabling massive data parallelism within each core. Instruction sets like AVX or SSE operate on large reserved registers and support a wide range of parallel arithmetic or logical operations enabling up to 16 double precision floating point operations per clock cycle. Current data mining applications are usually far from fully exploiting the ubiquitous computing power of SIMD instruction sets. Auto-vectorization as performed by most compilers can only be effective if the logical flow of the algorithm supports data parallelism. We consider the wide-spread algorithm K-means as a simple but highly relevant use case for mining vector data. The algorithm SIMD-K-means is a completely re-engineered version of K-means for current microarchitectures with two key ingredients enabling effective vectorization: (1) data-sensitive strategies in order to minimize the information to be transferred to the registers and (2) jointly coding the minimum distance and the best cluster representative which enables a K-means algorithm without any branching. Combined with efficient collection of sufficient statistics for the update of the cluster centers and space-and time-efficient calculation of Euclidean distances by the scalar product, these two ideas provide a substantial speed-up factor of about 5 over standard K-means. Intra-core SIMD parallelism can be combined with approaches focusing on multi-core and distributed environments. Moreover, our general ideas are not limited to K-means but support a wide range of data mining tasks on vector data including classification and outlier detection.
  • Keywords
    "Registers","Data mining","Instruction sets","Parallel processing","Clocks","Algorithm design and analysis","Clustering algorithms"
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshop (ICDMW), 2015 IEEE International Conference on
  • Electronic_ISBN
    2375-9259
  • Type

    conf

  • DOI
    10.1109/ICDMW.2015.85
  • Filename
    7395722