DocumentCode :
2871876
Title :
Classification of medical documents according to diseases
Author :
Parlak, Bekir ; Uysal, Alper Kursat
Author_Institution :
Bilgisayar Muhendisligi Bolumu, Anadolu Univ., Eskişehir, Turkey
fYear :
2015
fDate :
16-19 May 2015
Firstpage :
1635
Lastpage :
1638
Abstract :
Medical text classification is still one of the popular research problems inside text classification domain. Apart from some text data compiled from hospital records, most of the researchers in this field evaluate their classification methodologies on documents from MEDLINE database. When whole documents in the database are taken into consideration, MEDLINE is a multi-class and multi-label database. A dataset, containing a small subset of MEDLINE documents belonging to disease categories, is constructed in this study. It is a multi-class but single-label dataset. Due to the highly unbalanced distribution of this dataset, only documents belonging to top-10 disease categories are used in the experiments. The performances of three different pattern classifiers are analyzed on disease classification problem using this dataset. These three pattern classifiers are Bayesian network, C4.5 decision tree, and Random Forest trees. Experiments are realized for the two different cases where the stemming preprocessing step is applied or not. Experimental results show that the most successful classifier among three classifiers is Bayesian network classifier. Also, the best performance is obtained without applying stemming.
Keywords :
belief networks; decision trees; diseases; medical information systems; pattern classification; random processes; text analysis; Bayesian network classifier; C4.5 decision tree; MEDLINE database; MEDLINE documents; classification methodology; disease category; disease classification; diseases; hospital record; medical document classification; medical text classification; multiclass database; multilabel database; pattern classifier; random forest tree; text classification domain; Bayes methods; Classification algorithms; Databases; Diseases; Internet; Knowledge based systems; Text categorization; MeSH headings; Text classification; disease classification; medical documents;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing and Communications Applications Conference (SIU), 2015 23th
Conference_Location :
Malatya
Type :
conf
DOI :
10.1109/SIU.2015.7130164
Filename :
7130164
Link To Document :
بازگشت