• DocumentCode
    2984595
  • Title

    Distributed Matrix Completion

  • Author

    Teflioudi, C. ; Makari, F. ; Gemulla, R.

  • Author_Institution
    Max-Planck-Inst. fur Inf., Saarbrucken, Germany
  • fYear
    2012
  • fDate
    10-13 Dec. 2012
  • Firstpage
    655
  • Lastpage
    664
  • Abstract
    We discuss parallel and distributed algorithms for large-scale matrix completion on problems with millions of rows, millions of columns, and billions of revealed entries. We focus on in-memory algorithms that run on a small cluster of commodity nodes, even very large problems can be handled effectively in such a setup. Our DALS, ASGD, and DSGD++ algorithms are novel variants of the popular alternating least squares and stochastic gradient descent algorithms, they exploit thread-level parallelism, in-memory processing, and asynchronous communication. We provide some guidance on the asymptotic performance of each algorithm and investigate the performance of both our algorithms and previously proposed Map Reduce algorithms in large-scale experiments. We found that DSGD++ outperforms competing methods in terms of overall runtime, memory consumption, and scalability. Using DSGD++, we can factor a matrix with 10B entries on 16 compute nodes in around 40 minutes.
  • Keywords
    data mining; gradient methods; least squares approximations; parallel algorithms; ASGD algorithm; DALS algorithm; DSGD++ algorithm; Map Reduce algorithm; alternating least squares algorithm; asymptotic performance; asynchronous communication; commodity node; data mining; distributed algorithm; distributed matrix completion; in-memory algorithm; in-memory processing; large-scale matrix completion; memory consumption; parallel algorithm; stochastic gradient descent algorithm; thread-level parallelism; Algorithm design and analysis; Clustering algorithms; Convergence; Distributed algorithms; Instruction sets; Schedules; Training; ALS; parallel and distributed matrix factorization; recommender systems; stochastic gradient descent;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2012 IEEE 12th International Conference on
  • Conference_Location
    Brussels
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4673-4649-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2012.120
  • Filename
    6413862