DocumentCode :
1824927
Title :
A high performance algorithm for clustering of large-scale protein mass spectrometry data using multi-core architectures
Author :
Saeed, Fahad ; Hoffert, Jason D. ; Knepper, Mark A.
Author_Institution :
Epithelial Syst. Biol. Lab., Nat. Inst. of Health (NIH), Bethesda, MD, USA
fYear :
2013
fDate :
25-28 Aug. 2013
Firstpage :
923
Lastpage :
930
Abstract :
High-throughput mass spectrometers can produce thousands of peptide spectra from a single complex protein sample in a short amount of time. These data sets contain a substantial amount of redundancy (i.e. the same peptide is selected and identified multiple times in a single experiment) from peptides that may get selected multiple times in the liquid chromatography mass spectrometry (LC-MS/MS) experiment. The data from these mass spectrometers contain a substantial number of spectra that have low signal to noise (S/N) ratio and may not get interpreted due to poor quality. Recently, we presented a graph theoretic algorithm, CAMS (Clustering Algorithm for Mass Spectra) for clustering mass spectrometry data. CAMS utilized a novel metric, called a F-set, that allows accurate identification of the spectra that are similar with much higher accuracy and sensitivity than if single peak comparisons were performed. In this paper we present a multithreaded algorithm, called P-CAMS, for clustering of mass spectral data on multicore machines. The algorithm relies on intelligent matrix completion for graph construction and a load-balancing scheme for substantial speedups. We study the scalability performance of the proposed parallel algorithm on a multicore machine using synthetically generated spectra with parameters carefully chosen to mimic real-world mass spectrometry datasets. Real experimental datasets were also generated for quality assessment of the clustering results from the proposed algorithm. The results show that the proposed algorithms have scalable runtime performances and gives clustering results similar to a serial algorithm. The study also provides insight into the design of high performance algorithms for irregular problems in proteomics on many-core architectures.
Keywords :
biology computing; graph theory; mass spectroscopy; multi-threading; multiprocessing systems; parallel algorithms; pattern clustering; proteomics; resource allocation; P-CAMS; clustering algorithm for mass spectra; graph construction; high performance algorithm; intelligent matrix completion; large-scale protein mass spectrometry data; load-balancing scheme; mass spectral data clustering; mass spectrometry datasets; multicore architectures; multicore machines; multithreaded algorithm; parallel algorithm; quality assessment; scalability performance; synthetically generated spectra; Algorithm design and analysis; Arrays; Clustering algorithms; Mass spectroscopy; Peptides; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on
Conference_Location :
Niagara Falls, ON
Type :
conf
Filename :
6785810
Link To Document :
بازگشت