• DocumentCode
    2773610
  • Title

    HOCT: A Highly Scalable Algorithm for Training Linear CRF on Modern Hardware

  • Author

    Chen, Tianyuan ; Chang, Lei ; Ma, Jianqing ; Zhang, Wei ; Gao, Feng

  • Author_Institution
    Fudan Univ., Shanghai, China
  • fYear
    2009
  • fDate
    6-6 Dec. 2009
  • Firstpage
    276
  • Lastpage
    281
  • Abstract
    Conditional Random Fields (CRFs) are widely used in machine learning and natural language processing fields. A number of methods have been developed for CRF training. However, even with state-of-the-art algorithms, the training of CRF is still very time and space consuming. This make it infeasible to use CRFs in large-scale data analysis tasks. This paper proposes an efficient algorithm, HOCT, for CRF training on modern computer architectures. First, software prefetching techniques are utilized to hide cache miss latency. Second, we exploit SIMD to process data in parallel. Third, when dealing with large data sets, we let HOCT instead of operating system to manage swapping operations. Our experiments on various real data sets show that HOCT yields a fourfold speedup when the data can fit in memory, and over a 30-fold speedup when the memory requirement exceeds the physical memory.
  • Keywords
    data analysis; learning (artificial intelligence); natural language processing; parallel processing; storage management; CRF training; HOCT algorithm; SIMD process; conditional random fields; large-scale data analysis tasks; machine learning; natural language processing; parallel processing; software prefetching techniques; Computer architecture; Data analysis; Delay; Hardware; Large-scale systems; Machine learning; Machine learning algorithms; Management training; Natural language processing; Prefetching;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops, 2009. ICDMW '09. IEEE International Conference on
  • Conference_Location
    Miami, FL
  • Print_ISBN
    978-1-4244-5384-9
  • Electronic_ISBN
    978-0-7695-3902-7
  • Type

    conf

  • DOI
    10.1109/ICDMW.2009.69
  • Filename
    5360418