Title :
CGM: A biomedical text categorization approach using concept graph mining
Author :
Bleik, Said ; Song, Min ; Smalter, Aaron ; Huan, Jun ; Lushington, Gerald
Author_Institution :
Dept. of Inf. Syst., New Jersey Inst. of Technol., Newark, NJ, USA
Abstract :
Text Categorization is used to organize and manage biomedical text databases that are growing at an exponential rate. Feature representations for documents are a crucial factor for the performance of text categorization. Most of the successful existing techniques use a vector representation based on key entities extracted from the text. In this paper we investigate a new direction where we represent a document as a graph. In this representation we identify high level concepts and build a rich graph structure that contains additional concepts and relationships. We then use graph kernel techniques to perform text categorization. The results show a significant improvement in accuracy when compared to categorization based on only the extracted concepts.
Keywords :
data mining; medical computing; text analysis; biomedical text categorization approach; biomedical text databases; concept graph mining; graph kernel techniques; Data mining; Engineering management; Information retrieval; Kernel; Management information systems; Spatial databases; Technology management; Text categorization; Unified modeling language; User-generated content;
Conference_Titel :
Bioinformatics and Biomedicine Workshop, 2009. BIBMW 2009. IEEE International Conference on
Conference_Location :
Washington, DC
Print_ISBN :
978-1-4244-5121-0
DOI :
10.1109/BIBMW.2009.5332134