DocumentCode :
3600789
Title :
Hadoop Recognition of Biomedical Named Entity Using Conditional Random Fields
Author :
Kenli Li ; Wei Ai ; Fan Zhang ; Lingang Jiang ; Keqin Li ; Kai Hwang
Author_Institution :
Coll. of Inf. Sci. & Eng., Hunan Univ., Changsha, China
Volume :
26
Issue :
11
fYear :
2015
Firstpage :
3040
Lastpage :
3051
Abstract :
Processing large volumes of data has presented a challenging issue, particularly in data-redundant systems. As one of the most recognized models, the conditional random fields (CRF) model has been widely applied in biomedical named entity recognition (Bio-NER). Due to the internally sequential feature, performance improvement of the CRF model is nontrivial, which requires new parallelized solutions. By combining and parallelizing the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) and Viterbi algorithms, we propose a parallel CRF algorithm called MapReduce CRF (MRCRF) in this paper, which contains two parallel sub-algorithms to handle two time-consuming steps of the CRF model. The MapReduce L-BFGS (MRLB) algorithm leverages the MapReduce framework to enhance the capability of estimating parameters. Furthermore, the MapReduce Viterbi (MRVtb) algorithm infers the most likely state sequence by extending the Viterbi algorithm with another MapReduce job. Experimental results show that the MRCRF algorithm outperforms other competing methods by exhibiting significant performance improvement in terms of time efficiency as well as preserving a guaranteed level of correctness.
Keywords :
data analysis; maximum likelihood estimation; medical information systems; parallel processing; random processes; Hadoop recognition; L-BFGS; MRLB algorithm; MRVtb algorithm; MapReduce CRF model; MapReduce L-BFGS algorithm; MapReduce Viterbi algorithm; biomedical named entity recognition; conditional random field; data processing; data redundant system; limited-memory Broyden-Fletcher-Goldfarb-Shanno; sequential feature; Biological system modeling; Hidden Markov models; Inference algorithms; Training; Training data; Vectors; Viterbi algorithm; Biomedical named entity recognition; MapReduce; conditional random fields; parallel algorithm;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2014.2368568
Filename :
6949632
Link To Document :
بازگشت