• DocumentCode
    3460986
  • Title

    GeneIndex: An Open Source Parallel Program for Enumerating and Locating Words in a Genome

  • Author

    Li, Huian ; Hart, David ; Mueller, Matthias ; Markwardt, Ulf ; Stewart, Craig

  • Author_Institution
    Univ. IT Services, Indiana Univ., Indianapolis, IN, USA
  • fYear
    2009
  • fDate
    3-5 Aug. 2009
  • Firstpage
    98
  • Lastpage
    102
  • Abstract
    GeneIndex is an open-source program that locates words of any length k specified by the user in a sequence. GeneIndex is useful for understanding the structure of entire genomes or very large sets of genetic sequences, particularly in finding highly repeated words and words that occur with low frequency. GeneIndex accepts DNA sequences in FASTA format input files, and performs computations and input/output in parallel. GeneIndex has been implemented on Linux, IBM AIX, and NEC SX-8, and is available with test data sets (the entire genomes of Drosophila melanogaster and Homo sapiens). The performance of the program scales well with processor count -- that is, as the number of processors increases, the processing time required decreases proportionally.
  • Keywords
    DNA; biology computing; genetics; genomics; input-output programs; parallel programming; public domain software; sequences; DNA sequence; FASTA format input file; GeneIndex; genetic sequence; genome structure; input-output program; open-source parallel program; Bioinformatics; Concurrent computing; DNA computing; Frequency; Genetics; Genomics; Linux; National electric code; Open source software; Sequences; frequencies; gene sequence; locations; parallel processing; words;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics, Systems Biology and Intelligent Computing, 2009. IJCBS '09. International Joint Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-0-7695-3739-9
  • Type

    conf

  • DOI
    10.1109/IJCBS.2009.127
  • Filename
    5260731