DocumentCode
3290044
Title
Machine Learning Methods for Medical Text Categorization
Author
Zhang, Qirui ; Tan, Jinghua ; Zhou, Huaying ; Tao, Weiye ; He, Kejing
Author_Institution
Coll. of Med. Inf. Eng., Guangdong Pharm. Univ., Guangzhou, China
fYear
2009
fDate
16-17 May 2009
Firstpage
494
Lastpage
497
Abstract
This paper reports a comparative study for medical text categorizations on four machine learning methods: k nearest neighbor (kNN), support vector machines (SVM), naive Bayes (NB) and clonal selection algorithm based on antibody density (CSABAD). CSABAD is an improved immune algorithm proposed by us. According to the clonal selection principle and density control mechanism, only those cells that have higher affinity and lower density are selected to proliferate. In addition, we propose an improved approach, called term frequency, inverted document frequency and inverted entropy (TFIDFIE), to compute term weights in document indexing. It considers the distribution of documents in the training set in which the term occurs. Our experiments show that SVM and CSABAD outperform significantly kNN and naive Bayes, and TFIDFIE is more effective than TFIDF on OHSCAL data set.
Keywords
document handling; indexing; learning (artificial intelligence); medical computing; support vector machines; text analysis; antibody density; clonal selection algorithm; density control mechanism; document indexing; improved immune algorithm; inverted document frequency; inverted entropy; k nearest neighbor; machine learning methods; medical text categorization; naive Bayes; support vector machines; term frequency; Entropy; Frequency; Immune system; Indexing; Learning systems; Machine learning algorithms; Nearest neighbor searches; Niobium; Support vector machines; Text categorization; document indexing; immune algorithm; machine learning; medical text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Circuits, Communications and Systems, 2009. PACCS '09. Pacific-Asia Conference on
Conference_Location
Chengdu
Print_ISBN
978-0-7695-3614-9
Type
conf
DOI
10.1109/PACCS.2009.156
Filename
5232395
Link To Document