• DocumentCode
    3200468
  • Title

    CA-SVM: Communication-Avoiding Support Vector Machines on Distributed Systems

  • Author

    Yang You ; Demmel, James ; Czechowski, Kenneth ; Le Song ; Vuduc, Richard

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
  • fYear
    2015
  • fDate
    25-29 May 2015
  • Firstpage
    847
  • Lastpage
    859
  • Abstract
    We consider the problem of how to design and implement communication-efficient versions of parallel support vector machines, a widely used classifier in statistical machine learning, for distributed memory clusters and supercomputers. The main computational bottleneck is the training phase, in which a statistical model is built from an input data set. Prior to our study, the parallel is efficiency of a state-of-the-art implementation scaled as W = Omega(P3), where W is the problem size and P the number of processors, this scaling is worse than even a one-dimensional block row dense matrix vector multiplication, which has W = Omega(P2). This study considers a series of algorithmic refinements, leading ultimately to a Communication-Avoiding SVM (CASVM) method that improves the is efficiency to nearly W = Omega(P). We evaluate these methods on 96 to 1536 processors, and show average speedups of 3 - 16× (7× on average) over Dis-SMO, and a 95% weak-scaling efficiency on six real world datasets, with only modest losses in overall classification accuracy. The source code can be downloaded at https://github.com/fastalgo/casvm.
  • Keywords
    learning (artificial intelligence); parallel machines; parallel processing; statistical analysis; support vector machines; CA-SVM; communication-avoiding SVM method; communication-avoiding support vector machines; communication-efficient versions; dense matrix vector multiplication; distributed memory clusters; distributed systems; parallel support vector machines; statistical machine learning; statistical model; supercomputers; Accuracy; Kernel; Mathematical model; Partitioning algorithms; Program processors; Support vector machines; Training; communication avoidance; distributed memory algorithms; statistical machine learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
  • Conference_Location
    Hyderabad
  • ISSN
    1530-2075
  • Type

    conf

  • DOI
    10.1109/IPDPS.2015.117
  • Filename
    7161571