DocumentCode :
83780
Title :
Text Categorization of Biomedical Data Sets Using Graph Kernels and a Controlled Vocabulary
Author :
Bleik, Said ; Mishra, Mahesh K. ; Jun Huan ; Min Song
Author_Institution :
Inf. Syst. Dept., Univ. Heights, Newark, NJ, USA
Volume :
10
Issue :
5
fYear :
2013
fDate :
Sept.-Oct. 2013
Firstpage :
1211
Lastpage :
1217
Abstract :
Recently, graph representations of text have been showing improved performance over conventional bag-of-words representations in text categorization applications. In this paper, we present a graph-based representation for biomedical articles and use graph kernels to classify those articles into high-level categories. In our representation, common biomedical concepts and semantic relationships are identified with the help of an existing ontology and are used to build a rich graph structure that provides a consistent feature set and preserves additional semantic information that could improve a classifier´s performance. We attempt to classify the graphs using both a set-based graph kernel that is capable of dealing with the disconnected nature of the graphs and a simple linear kernel. Finally, we report the results comparing the classification performance of the kernel classifiers to common text-based classifiers.
Keywords :
classification; graph grammars; medical computing; ontologies (artificial intelligence); semantic networks; text analysis; vocabulary; article classification; biomedical articles; biomedical data sets; classification performance; classifier performance; common biomedical concepts; common text-based classifiers; consistent feature set; controlled vocabulary; conventional bag-of-words representations; graph disconnected nature; graph-based representation; high-level categories; kernel classifier; ontology; rich graph structure; semantic information; semantic relationship; set-based graph kernel; simple linear kernel; text categorization applications; text graph representations; Graph representations; Kernel; Semantics; Support vector machine classification; Text categorization; Unified modeling language; Text categorization; biomedical ontologies; classifier design and evaluation; graph kernels; graph representations; mining methods and algorithms; modeling structured; text mining; textual and multimedia data;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2013.16
Filename :
6475935
Link To Document :
بازگشت