• DocumentCode
    1348445
  • Title

    Gene Alert-a sequence search results keyword parser

  • Author

    Huang, H. ; Garner, H.R.

  • Author_Institution
    Texas Univ. Southwestern Medical Center, TX, USA
  • Volume
    17
  • Issue
    2
  • fYear
    1998
  • Firstpage
    119
  • Lastpage
    122
  • Abstract
    Similarity searching is an important tool to many biological scientists. Various computer implementations (BLAST, FASTA, Smith-Waterman) are used by scientists to analyze their sequences of interest to identify identities (perfect matches) or similarities (statistically significant matches) between their query sequences and large databases such as GenBank. Search engines currently return brief annotations and alignments ranked in order of statistical significance or raw similarity score. However, it is frequently not the top-scoring similarities that bring important new information to the investigating scientist, but the content of the annotation or similarity "hits" at any significant score. The Gene Alert algorithm applies additional filtering and a user weighted keyword search to the BLAST output to parse the output into a form customized to the user. There are three components to the Gene Alert implementation as it is currently operating: an organized file structure, a BLAST engine, and a parser written in the PERL scripting language. The file structure was designed to place code and database components in logical positions and to facilitate future complete automation of the Gene Alert and similarity search system. Shown here is the file structure within the UNIX environment.
  • Keywords
    biology computing; genetics; grammars; BLAST; Gene Alert; PERL scripting language; UNIX environment; annotation; database components; logical positions; organized file structure; sequence search results keyword parser; significant score; similarity score; similarity search system; Automation; Bioinformatics; Biological information theory; Biology computing; Computer science education; Couplings; Databases; Genomics; History; Search engines; Algorithms; Base Sequence; Computational Biology; Databases, Factual; Genes; Humans; Sequence Homology, Nucleic Acid; Software;
  • fLanguage
    English
  • Journal_Title
    Engineering in Medicine and Biology Magazine, IEEE
  • Publisher
    ieee
  • ISSN
    0739-5175
  • Type

    jour

  • DOI
    10.1109/51.664040
  • Filename
    664040