• DocumentCode
    3754008
  • Title

    Big data proteogenomics and high performance computing: Challenges and opportunities

  • Author

    Fahad Saeed

  • Author_Institution
    Department of Electrical and Computer Engineering, Department of Computer Science, Western Michigan University, Kalamazoo, MI 49008-5466
  • fYear
    2015
  • Firstpage
    141
  • Lastpage
    145
  • Abstract
    Proteogenomics is an emerging field of systems biology research at the intersection of proteomics and genomics. Two high-throughput technologies, Mass Spectrometry (MS) for proteomics and Next Generation Sequencing (NGS) machines for genomics are required to conduct proteogenomics studies. Independently both MS and NGS technologies are inflicted with data deluge which creates problems of storage, transfer, analysis and visualization. Integrating these big data sets (NGS+MS) for proteogenomics studies compounds all of the associated computational problems. Existing sequential algorithms for these proteogenomics datasets analysis are inadequate for big data and high performance computing (HPC) solutions are almost non-existent. The purpose of this paper is to introduce the big data problem of proteogenomics and the associated challenges in analyzing, storing and transferring these data sets. Further, opportunities for high performance computing research community are identified and possible future directions are discussed.
  • Keywords
    "Genomics","Bioinformatics","Proteins","Big data","Proteomics","Databases","Protocols"
  • Publisher
    ieee
  • Conference_Titel
    Signal and Information Processing (GlobalSIP), 2015 IEEE Global Conference on
  • Type

    conf

  • DOI
    10.1109/GlobalSIP.2015.7418173
  • Filename
    7418173