• DocumentCode
    729419
  • Title

    Simulation of parallel similarity measure computations for large data sets

  • Author

    Czarnul, Pawel ; Rosciszewski, Pawel ; Matuszek, Mariusz ; Szymanski, Julian

  • Author_Institution
    Telecommun. & Inf., Gdansk Univ. of Technol., Gdansk, Poland
  • fYear
    2015
  • fDate
    24-26 June 2015
  • Firstpage
    472
  • Lastpage
    477
  • Abstract
    The paper presents our approach to implementation of similarity measure for big data analysis in a parallel environment. We describe the algorithm for parallelisation of the computations. We provide results from a real MPI application for computations of similarity measures as well as results achieved with our simulation software. The simulation environment allows us to model parallel systems of various sizes with various components such as CPUs, GPUs, network interconnects, and model parallel applications in a meta language. The simulations allow us to determine in details how computations will be performed on a particular hardware. They also allow to predict the shapes of time curves beyond the area where empirical results can be obtained due to limited computational resources such as memory capacity.
  • Keywords
    Big Data; data analysis; digital simulation; message passing; parallel processing; Big Data analysis; MPI application; parallel similarity measure; parallelisation algorithm; simulation software; Algorithm design and analysis; Big data; Clustering algorithms; Computational modeling; Data models; Hardware; big data analysis; distance based categorisation; simulation of parallelisation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cybernetics (CYBCONF), 2015 IEEE 2nd International Conference on
  • Conference_Location
    Gdynia
  • Print_ISBN
    978-1-4799-8320-9
  • Type

    conf

  • DOI
    10.1109/CYBConf.2015.7175980
  • Filename
    7175980