• DocumentCode
    245104
  • Title

    A Parallel and Efficient Algorithm for Learning to Match

  • Author

    Jingbo Shang ; Tianqi Chen ; Hang Li ; Zhengdong Lu ; Yong Yu

  • Author_Institution
    Univ. of Illinois at Urbana Champaign, Champaign, IL, USA
  • fYear
    2014
  • fDate
    14-17 Dec. 2014
  • Firstpage
    971
  • Lastpage
    976
  • Abstract
    Many tasks in data mining and related fields can be formalized as matching between objects in two heterogeneous domains, including collaborative filtering, link prediction, image tagging, and web search. Machine learning techniques, referred to as learning-to-match in this paper, have been successfully applied to the problems. Among them, a class of state-of-the-art methods, named feature-based matrix factorization, formalize the task as an extension to matrix factorization by incorporating auxiliary features into the model. Unfortunately, making those algorithms scale to real world problems is challenging, and simple parallelization strategies fail due to the complex cross talking patterns between sub-tasks. In this paper, we tackle this challenge with a novel parallel and efficient algorithm. Our algorithm, based on coordinate descent, can easily handle hundreds of millions of instances and features on a single machine. The key recipe of this algorithm is an iterative relaxation of the objective to facilitate parallel updates of parameters, with guaranteed convergence on minimizing the original objective function. Experimental results demonstrate that the proposed method is effective on a wide range of matching problems, with efficiency significantly improved upon the baselines while accuracy retained unchanged.
  • Keywords
    convergence; iterative methods; learning (artificial intelligence); matrix decomposition; parallel algorithms; pattern matching; Web search; auxiliary feature; collaborative filtering; complex cross talking patterns; convergence; coordinate descent; data mining; efficient algorithm; feature-based matrix factorization; heterogeneous domain; image tagging; iterative relaxation; learning-to-match; link prediction; machine learning techniques; matching problems; parallel algorithm; parallelization strategy; state-of-the-art method; Algorithm design and analysis; Collaboration; Convergence; Parallel algorithms; Prediction algorithms; Time complexity; Training; collaborative filtering; learning to match; parallel matrix factorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2014 IEEE International Conference on
  • Conference_Location
    Shenzhen
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4799-4303-6
  • Type

    conf

  • DOI
    10.1109/ICDM.2014.71
  • Filename
    7023432