Title :
A Two-Phase Bio-NER System Based on Integrated Classifiers and Multiagent Strategy
Author :
Lishuang Li ; Wenting Fan ; Degen Huang
Author_Institution :
Coll. of Comput. Sci. & Technol., Dalian Univ. of Technol., Dalian, China
Abstract :
Biomedical named entity recognition (Bio-NER) is a fundamental step in biomedical text mining. This paper presents a two-phase Bio-NER model targeting at JNLPBA task. Our two-phase method divides the task into two subtasks: named entity detection (NED) and named entity classification (NEC). The NED subtask is accomplished based on the two-layer stacking method in the first phase, where named entities (NEs) are distinguished from nonnamed-entities (NNEs) in biomedical literatures without identifying their types. Then six classifiers are constructed by four toolkits (CRF++, YamCha, maximum entropy, Mallet) with different training methods and integrated based on the two-layer stacking method. In the second phase for the NEC subtask, the multiagent strategy is introduced to determine the correct entity type for entities identified in the first phase. The experiment results show that the presented approach can achieve an F-score of 76.06 percent, which outperforms most of the state-of-the-art systems.
Keywords :
bioinformatics; classification; data mining; maximum entropy methods; medical computing; multi-agent systems; text analysis; CRF++ toolkit; F-score; JNLPBA task; Mallet toolkit; NEC subtask; NED subtask; YamCha toolkit; biomedical literature; biomedical named entity recognition system; biomedical text mining; correct entity type determination; integrated classifier; maximum entropy toolkit; multiagent strategy; named entity classification subtask; named entity detection subtask; toolkit training method; two-layer stacking method; two-phase Bio-NER model targeting; two-phase Bio-NER system; Biological system modeling; Computational modeling; Hidden Markov models; Proteins; RNA; Stacking; Training; Named entity recognition and classification; bioinformatics; multiagent; two-layer stacking method;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2013.106