• DocumentCode
    3576227
  • Title

    A bio-sequence k-mer frequency counter (kFC)

  • Author

    Biji, C.L. ; Nair, Achuthsankar S. ; Madhu, Manu K. ; Vijayakumar, R.

  • Author_Institution
    Dept. of Comput. Biol. & Bioinf., Univ. of Kerala, Thiruvananthapuram, India
  • fYear
    2014
  • Firstpage
    353
  • Lastpage
    356
  • Abstract
    The high-throughput sequencing data from next generation sequencing technologies demands the need for over presented k-mers for being de novo sequenced and assembled. Even though generating k-mer frequency distribution of a sequence seems to be a simple task, memory usage and time are two important concerns especially for higher order mers. This paper proposes a method to find count of over represented k-mers in bio-sequences. The approach uses a hash table with open address scheme for estimating the frequency count of k-mers. The algorithm support both overlapping and non-overlapping pattern for nucleotide sequences, amino acid sequences and Next generation sequencing read sequences. Moreover, it even accept nucleotide sequences from the extended alphabet set in contrast to the traditional k-mer tool which accepts only the standard alphabet.
  • Keywords
    biology computing; data analysis; organic compounds; sequences; alphabet set; amino acid sequences; bio-sequence k-mer frequency counter; de novo assembled; de novo sequenced; hash table; high-throughput sequencing data; k-mer frequency distribution; k-mer tool; kFC; memory time; memory usage; next generation sequencing technologies; nonoverlapping pattern; nucleotide sequences; open address scheme; read data analysis; read sequences; Algorithm design and analysis; Animals; Arrays; Bioinformatics; Genomics; Random access memory; Sequential analysis; Biosequence analysis; Read data analysis; k-mer counter;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Circuits, Communication, Control and Computing (I4C), 2014 International Conference on
  • Print_ISBN
    978-1-4799-6545-8
  • Type

    conf

  • DOI
    10.1109/CIMCA.2014.7057822
  • Filename
    7057822