Title of article :
Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Author/Authors :
Akkasi, A Department of Computer Engineering - Bandar Abbas Branch - Islamic Azad University - Bandar Abbas, Iran , Varoglu, E Computer Engineering Department - Eastern Mediterranean University - Famagusta - North Cyprus - Via Mersin 10, Turkey
Pages :
9
From page :
311
To page :
319
Abstract :
Chemical Named Entity Recognition (NER) is the basic step for the consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, and extraction of names of the molecules and their properties. Improvement of the performance of such systems may affect the quality of the subsequent tasks. The chemical text from which data for NER is extracted is naturally imbalanced since chemical entities are fewer compared to the other segments of the text. In this work, the class imbalance problem in the context of chemical NER is studied, and an adopted version of random under-sampling for the NER data is leveraged to generate a pool of classifiers. In order to keep the class distribution balanced within each sentence, the well-known random under-sampling method is modified to a sentence-based version, where a random removal of the samples takes place within each sentence instead of considering the dataset as a whole. Furthermore, in order to take the advantages of combination of a set of diverse predictors, an ensemble of classifiers trained with the set of different training data resulted by sentence-based under-sampling is created. The proposed approach is developed and tested using the ChemDNER corpus released by BioCreative IV. The results obtained show that the proposed method improves the classification performance of the baseline classifiers, mainly as a result of an increase in the recall. Furthermore, the combination of high performance classifiers trained using the under-sampled train data surpasses the performance of all single best classifiers and the combination of classifiers using the full data.
Keywords :
Classifier Combination , Random Under-Sampling , Class Imbalance Problem , Chemical Named Entity Recognition
Journal title :
Astroparticle Physics
Serial Year :
2019
Record number :
2452975
Link To Document :
بازگشت