Title :
Semantic Smoothing the Multinomial Naive Bayes for Biomedical Literature Classification
Author :
Wen, Jian ; Li, Zhoujun
Author_Institution :
Nat. Univ. of Defence Technol., Changsha
Abstract :
Huge biomedical literatures result in many new challenges on text classification, its efficiency and sparseness of data attract many researchers. Recent success of language modeling in information retrieval have let us consider again about multinomial Naive Bayes for text classification. In this paper, we propose a semantic smoothing method for Naive Bayes model, biomedical documents were indexed by the concept of UMLS, and at the same time concept pairs which are context-sensitive were extracted as topic signature, the translation between concept pair and concept is attained using EM algorithm. Then classification model is estimated by a mixture model combined with this semantic smoothing method. Ontology-based document representation can deal with synonym and reduce the concept vector. The semantic smoothing method can partly solve the sparseness of data. Our method is evaluated on OHSUMED and genomic track collection, and proper results were attained. We found this semantic smoothing method can attain better results than other simple smoothing method, also this method is significant because of its simpleness, comprehensibility.
Keywords :
Bayes methods; expectation-maximisation algorithm; information retrieval; medical information systems; ontologies (artificial intelligence); smoothing methods; text analysis; EM algorithm; biomedical document indexing; biomedical literature classification; information retrieval; language modeling; multinomial naive Bayes model; ontology-based document representation; semantic smoothing method; text classification; unified medical language system; Bioinformatics; Biomedical computing; Data mining; Genomics; Information retrieval; Smoothing methods; Support vector machine classification; Support vector machines; Text categorization; Unified modeling language;
Conference_Titel :
Granular Computing, 2007. GRC 2007. IEEE International Conference on
Conference_Location :
Fremont, CA
Print_ISBN :
978-0-7695-3032-1
DOI :
10.1109/GrC.2007.98