• DocumentCode
    3321860
  • Title

    BioSeek: exploiting source-capability information for integrated access to multiple bioinformatics data sources

  • Author

    Liu, Ling ; Buttler, David ; Critchlow, Terence ; Han, Wei ; Paques, Henrique ; Pu, Calton ; Rocco, Dan

  • Author_Institution
    Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA, USA
  • fYear
    2003
  • fDate
    10-12 March 2003
  • Firstpage
    263
  • Lastpage
    271
  • Abstract
    Modern Bioinformatics data sources are widely used by molecular biologists for homology searching and new drug discovery. User-friendly and yet responsive access is one of the most desirable properties for integrated access to the rapidly growing, heterogeneous, and distributed collection of data sources. The increasing volume and diversity of digital information related to bioinformatics (such as genomes, protein sequences, protein structures, etc.) have led to a growing problem that conventional data management systems do not have, namely finding which information sources out of many candidate choices are the most relevant and most accessible to answer a given user query. We refer to this problem as the query routing problem. In this paper we introduce the notation and issues of query routing, and present a practical solution for designing a scalable query routing system based on multi-level progressive pruning strategies. The key idea is to create and maintain source capability profiles independently, and to provide algorithms that can dynamically discover relevant information sources for a given query through the smart use of source profiles. Compared to the keyword-based indexing techniques adopted in most of the search engines and software, our approach offers fine-granularity of interest matching, thus it is more powerful and effective for handling queries with complex conditions.
  • Keywords
    medical computing; molecular biophysics; patient treatment; query processing; BioSeek; Modern Bioinformatics data sources; complex conditions; conventional data management systems; digital information; genomes; homology searching; keyword-based indexing techniques; multilevel progressive pruning strategies; multiple bioinformatics data sources; new drug discovery; protein sequences; protein structures; queries handling; scalable query routing system; source profiles; source-capability information; user-friendly responsive access; Bioinformatics; Biomedical engineering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Bioengineering, 2003. Proceedings. Third IEEE Symposium on
  • Print_ISBN
    0-7695-1907-5
  • Type

    conf

  • DOI
    10.1109/BIBE.2003.1188961
  • Filename
    1188961