• DocumentCode
    3677879
  • Title

    Genome Data Analysis Using MapReduce Paradigm

  • Author

    Mayank Pahadia;Akash Srivastava;Divyang Srivastava;Nagamma Patil

  • Author_Institution
    Dept. of Inf. Technol., Nat. Inst. of Technol., Surathkal, India
  • fYear
    2015
  • fDate
    5/1/2015 12:00:00 AM
  • Firstpage
    556
  • Lastpage
    559
  • Abstract
    Counting the number of occurences of a substringin a string is a problem in many applications. This paper suggests a fast and efficient solution for the field of bioinformatics. Ak-mer is a k-length sub string of a biological sequence. K-mercounting is defined as counting the number of occurences of all the possible k-mers in a biological sequence. K-mer counting has uses in applications ranging from error correction of sequencing reads, genome assembly, disease prediction and feature extraction. The current k-mer counting tools are both time and space costly. We provide a solution which uses MapReduce and Hadoop to reduce the time complexity. After applying the algorithms on real genome datasets, we concluded that the algorithm using Hadoopand MapReduce Paradigm runs more efficiently and reduces the time complexity significantly.
  • Keywords
    "Bioinformatics","Genomics","Diseases","DNA","Big data","Microorganisms"
  • Publisher
    ieee
  • Conference_Titel
    Advances in Computing and Communication Engineering (ICACCE), 2015 Second International Conference on
  • Print_ISBN
    978-1-4799-1733-4
  • Type

    conf

  • DOI
    10.1109/ICACCE.2015.68
  • Filename
    7306746