DocumentCode :
2702331
Title :
An improved relation-based information retrieval technique for bioinformatics
Author :
Li, Yan ; Wen, Jian ; Li, Zhoujun
Author_Institution :
Sch. of Comput., Nat. Univ. of Defense Technol., Changsha
fYear :
2008
fDate :
20-23 June 2008
Firstpage :
1536
Lastpage :
1541
Abstract :
One of the limitations with the current relationship-based IR models is that a relation is often recorded as a binary form, such as R(Term1,Term2), which is only composed of general information of a pair of two terms which are semantically and syntactically related to each other. To tackle this problem, a triple is defined in this paper as a data structure for the integration of a pair of concepts as well as a verb phrase or sometimes a special noun we extract from the sentence as the relation of the concepts pair. We applied the advanced ontology-based approach to extract generic concepts and relations by using both UMLS and WordNet, and implemented a new approach to rank retrieved passages from documents corresponding to measuring system performance mentioned in TREC 2007 Genomics Track. We built a new version (IRIRS) of the relation-based IR system (RIRS) developed by DM & Bioinformatics Lab of Drexel University in 2004. We use IRIRS to search answers in tests of English reading comprehension and improve the retrieval result of all official runs in TREC 2004 Genomics Track. The experiments which are based on the different collections show more promising performance of IRIRS than RIRS. The character-based MAP measuring passage-level retrieval performance, for 64 topics from the first collection is significantly raised from 64.44 % (RIRS) to 74.28%. The MAP (Mean Average Precision) for 50 topics from the second collection is raised from 21.71% (TREC) and 37.58% (RIRS) to 40.14%.
Keywords :
biology computing; data structures; information retrieval; medical information systems; ontologies (artificial intelligence); DM & Bioinformatics Lab; Drexel University; IRIRS; TREC 2007 Genomics Track; UMLS; WordNet; advanced ontology-based approach; bioinformatics; character-based MAP; data structure; english reading comprehension; generic concepts; information retrieval technique; mean average precision; passage-level retrieval performance; Bioinformatics; Data mining; Data structures; Delta modulation; Genomics; Information retrieval; Ontologies; System performance; Testing; Unified modeling language;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Automation, 2008. ICIA 2008. International Conference on
Conference_Location :
Changsha
Print_ISBN :
978-1-4244-2183-1
Electronic_ISBN :
978-1-4244-2184-8
Type :
conf
DOI :
10.1109/ICINFA.2008.4608247
Filename :
4608247
Link To Document :
بازگشت