DocumentCode :
1992518
Title :
Combining Semantics, Context, and Statistical Evidence in Genomics Literature Search
Author :
Urbain, Jan ; Goharian, Nazli ; Frieder, Ophir
Author_Institution :
Illinois Inst. of Technol., Chicago
fYear :
2007
fDate :
14-17 Oct. 2007
Firstpage :
1313
Lastpage :
1317
Abstract :
We present an information retrieval model for combining evidence from concept-based semantics, term statistics, and context for improving search precision of genomics literature by accurately identifying concise, variable length passages of text to answer a user query. The system combines a dimensional data model for indexing scientific literature at multiple levels of document structure and context with a rule-based query processing algorithm. The query processing algorithm uses an iterative information extraction technique to identify query concepts, and a retrieval function for systematically combining concepts with term statistics at multiple levels of context. We define context by variable length passages of text and different levels of document lexical structure including terms, sentences, paragraphs, and entire documents. Our results demonstrate improved search results in the presence of varying levels of semantic evidence, and higher performance using retrieval functions that combine document as well as sentence and passage level information versus using document, sentence or passage level information alone. Initial results are promising. When ranking documents based on the most relevant extracted passages, the results exceed the state-of-the-art by 13.89% as assessed by the TREC 2005 Genomics track collection of 4.5 million MEDLINE citations.
Keywords :
DNA; cellular biophysics; content-based retrieval; genetic algorithms; indexing; information retrieval systems; medical information systems; molecular biophysics; programming language semantics; MEDLINE citations; TREC 2005 genomics track collection; concept-based semantics; context; dimensional data model; document lexical structure; document structure multiple levels; genomics literature search; information retrieval model; iterative information extraction; paragraphs; rule-based query processing algorithm; scientific literature indexing; search precision; sentences; term statistics; text variable length passages; user query; Bioinformatics; Context modeling; Data mining; Data models; Genomics; Indexing; Information retrieval; Iterative algorithms; Query processing; Statistics; H.3.1 [Information Storage and Retrieval]: Context Analysis and Indexing-linguistic processing; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval-search process; I.2.7 [Artificial Intelligence]: Natural Language Processing-text analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4244-1509-0
Type :
conf
DOI :
10.1109/BIBE.2007.4375738
Filename :
4375738
Link To Document :
بازگشت