DocumentCode
2773610
Title
HOCT: A Highly Scalable Algorithm for Training Linear CRF on Modern Hardware
Author
Chen, Tianyuan ; Chang, Lei ; Ma, Jianqing ; Zhang, Wei ; Gao, Feng
Author_Institution
Fudan Univ., Shanghai, China
fYear
2009
fDate
6-6 Dec. 2009
Firstpage
276
Lastpage
281
Abstract
Conditional Random Fields (CRFs) are widely used in machine learning and natural language processing fields. A number of methods have been developed for CRF training. However, even with state-of-the-art algorithms, the training of CRF is still very time and space consuming. This make it infeasible to use CRFs in large-scale data analysis tasks. This paper proposes an efficient algorithm, HOCT, for CRF training on modern computer architectures. First, software prefetching techniques are utilized to hide cache miss latency. Second, we exploit SIMD to process data in parallel. Third, when dealing with large data sets, we let HOCT instead of operating system to manage swapping operations. Our experiments on various real data sets show that HOCT yields a fourfold speedup when the data can fit in memory, and over a 30-fold speedup when the memory requirement exceeds the physical memory.
Keywords
data analysis; learning (artificial intelligence); natural language processing; parallel processing; storage management; CRF training; HOCT algorithm; SIMD process; conditional random fields; large-scale data analysis tasks; machine learning; natural language processing; parallel processing; software prefetching techniques; Computer architecture; Data analysis; Delay; Hardware; Large-scale systems; Machine learning; Machine learning algorithms; Management training; Natural language processing; Prefetching;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining Workshops, 2009. ICDMW '09. IEEE International Conference on
Conference_Location
Miami, FL
Print_ISBN
978-1-4244-5384-9
Electronic_ISBN
978-0-7695-3902-7
Type
conf
DOI
10.1109/ICDMW.2009.69
Filename
5360418
Link To Document