• DocumentCode
    47165
  • Title

    Runtime Optimizations for Tree-Based Machine Learning Models

  • Author

    Asadi, Nima ; Lin, James ; de Vries, Arjen P.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Maryland, College Park, MD, USA
  • Volume
    26
  • Issue
    9
  • fYear
    2014
  • fDate
    Sept. 2014
  • Firstpage
    2281
  • Lastpage
    2292
  • Abstract
    Tree-based models have proven to be an effective solution for web ranking as well as other machine learning problems in diverse domains. This paper focuses on optimizing the runtime performance of applying such models to make predictions, specifically using gradient-boosted regression trees for learning to rank. Although exceedingly simple conceptually, most implementations of tree-based models do not efficiently utilize modern superscalar processors. By laying out data structures in memory in a more cache-conscious fashion, removing branches from the execution flow using a technique called predication, and micro-batching predictions using a technique called vectorization, we are able to better exploit modern processor architectures. Experiments on synthetic data and on three standard learning-to-rank datasets show that our approach is significantly faster than standard implementations.
  • Keywords
    Internet; cache storage; information retrieval; learning (artificial intelligence); regression analysis; tree data structures; Web ranking; cache-conscious fashion; data structures; execution flow; gradient-boosted regression trees; learning-to-rank datasets; microbatching predictions; predication technique; processor architectures; runtime performance optimizations; superscalar processors; synthetic data; tree-based machine learning models; vectorization technique; Arrays; Indexes; Optimization; Predictive models; Program processors; Regression tree analysis; General; Information Storage and Retrieval; Information Technology and Systems; Learning to Rank; Scalability and Efficiency; Web Search; Web search; general information storage and retrieval; information technology and systems; learning to rank; regression trees; scalability and efficiency;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2013.73
  • Filename
    6513227