• DocumentCode
    3290044
  • Title

    Machine Learning Methods for Medical Text Categorization

  • Author

    Zhang, Qirui ; Tan, Jinghua ; Zhou, Huaying ; Tao, Weiye ; He, Kejing

  • Author_Institution
    Coll. of Med. Inf. Eng., Guangdong Pharm. Univ., Guangzhou, China
  • fYear
    2009
  • fDate
    16-17 May 2009
  • Firstpage
    494
  • Lastpage
    497
  • Abstract
    This paper reports a comparative study for medical text categorizations on four machine learning methods: k nearest neighbor (kNN), support vector machines (SVM), naive Bayes (NB) and clonal selection algorithm based on antibody density (CSABAD). CSABAD is an improved immune algorithm proposed by us. According to the clonal selection principle and density control mechanism, only those cells that have higher affinity and lower density are selected to proliferate. In addition, we propose an improved approach, called term frequency, inverted document frequency and inverted entropy (TFIDFIE), to compute term weights in document indexing. It considers the distribution of documents in the training set in which the term occurs. Our experiments show that SVM and CSABAD outperform significantly kNN and naive Bayes, and TFIDFIE is more effective than TFIDF on OHSCAL data set.
  • Keywords
    document handling; indexing; learning (artificial intelligence); medical computing; support vector machines; text analysis; antibody density; clonal selection algorithm; density control mechanism; document indexing; improved immune algorithm; inverted document frequency; inverted entropy; k nearest neighbor; machine learning methods; medical text categorization; naive Bayes; support vector machines; term frequency; Entropy; Frequency; Immune system; Indexing; Learning systems; Machine learning algorithms; Nearest neighbor searches; Niobium; Support vector machines; Text categorization; document indexing; immune algorithm; machine learning; medical text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Circuits, Communications and Systems, 2009. PACCS '09. Pacific-Asia Conference on
  • Conference_Location
    Chengdu
  • Print_ISBN
    978-0-7695-3614-9
  • Type

    conf

  • DOI
    10.1109/PACCS.2009.156
  • Filename
    5232395