DocumentCode :
2341190
Title :
A multi-level text mining method to extract biological relationships
Author :
Palakal, Mathew ; Stephens, Matthew ; Mukhopadhyay, Snehasis ; Raje, Rajeev ; Rhodes, Simon
Author_Institution :
Dept. of Comput. & Inf. Sci., Indiana Univ., Indianapolis, IN, USA
fYear :
2002
fDate :
2002
Firstpage :
97
Lastpage :
108
Abstract :
Accurate and computationally efficient approaches in discovering relationships between biological objects from text documents are important for biologists to develop biological models. This paper presents a novel approach to extract relationships between multiple biological objects that are present in a text document. The approach involves object identification, reference resolution, ontology and synonym discovery, and extracting object-object relationships. Hidden Markov models (HMMs), dictionaries, and N-Gram models are used to set the framework to tackle the complex task of extracting object-object relationships. Experiments were carried out using a corpus of one thousand Medline abstracts. Intermediate results were obtained for the object identification process, synonym discovery, and finally the relationship extraction. For a corpus of thousand abstracts, 53 relationships were extracted of which 43 were correct, giving a specificity of 81%. The approach is both adaptable and scalable to new problems as opposed to rule-based methods.
Keywords :
bibliographic systems; biology computing; data mining; dictionaries; hidden Markov models; scientific information systems; text analysis; Medline; N-Gram models; bibliographic database; biological models; biological relationships; dictionaries; experiments; hidden Markov models; multi-level text mining method; object identification; object-object relationships; ontology; reference resolution; synonym discovery; Abstracts; Bioinformatics; Biological system modeling; Biology computing; Data mining; Dictionaries; Hidden Markov models; Humans; Proteins; Text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society
Print_ISBN :
0-7695-1653-X
Type :
conf
DOI :
10.1109/CSB.2002.1039333
Filename :
1039333
Link To Document :
بازگشت