DocumentCode :
1800425
Title :
An ontology-based dimensionality reduction algorithm for biomedical literature classification
Author :
Jing Wang ; Gongqing Wu ; Xuegang Hu
Author_Institution :
School of Computer Science and Information Engineering, Hefei University of Technology, China, 230009
fYear :
2013
fDate :
1-8 Jan. 2013
Firstpage :
1
Lastpage :
5
Abstract :
Dimension reduction is an important component in automatic text categorization, especially biomedical literature classification. Many studies have showed that statistic-based dimension reduction algorithms, like Information Gain (IG), are very effective in document categorization. However these algorithms still suffer from major drawbacks. One facet is that they tend to use all the words as features. Another facet is that they can´t capture the semantic information that underlies the lexical words. To overcome these drawbacks, in this paper, a novel algorithm is presented to reduce the dimensionality of biomedical literature. First, a good biomedical concept set can be obtained by the ontology-based entity extraction technique to be the feature space. The semantic relatedness information is incorporated by mapping some original features to “Least-Max-Cover” features, according to the structure of the domain ontology. We demonstrate our method on the problem of classifying MEDLINE-indexed journal abstracts using C4.5 as the basic classifier. The experimental results show that our method has achieved a significant improvement in F-value (3.5%) and recall (5.25%) on average, compared with other state-of-the-art dimensionality reduction algorithms such as IG, CHI, One-R and LARS.
Keywords :
Classification algorithms; Educational institutions; Feature extraction; Ontologies; Prediction algorithms; Semantics; Text categorization; “Least-Max-Cover” strategy; automatic text categorization; dimension reduction; ontology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Conference Anthology, IEEE
Conference_Location :
China
Type :
conf
DOI :
10.1109/ANTHOLOGY.2013.6784753
Filename :
6784753
Link To Document :
بازگشت