DocumentCode
583276
Title
An efficient algorithm for clustering of large-scale mass spectrometry data
Author
Saeed, Fahad ; Pisitkun, Trairak ; Knepper, Mark A. ; Hoffert, Jason D.
Author_Institution
Epithelial Syst. Biol. Lab., Nat. Heart Lung & Blood Inst. (NHLBI), Bethesda, MD, USA
fYear
2012
fDate
4-7 Oct. 2012
Firstpage
1
Lastpage
4
Abstract
High-throughput spectrometers are capable of producing data sets containing thousands of spectra for a single biological sample. These data sets contain a substantial amount of redundancy from peptides that may get selected multiple times in a LC-MS/MS experiment. In this paper, we present an efficient algorithm, CAMS (Clustering Algorithm for Mass Spectra) for clustering mass spectrometry data which increases both the sensitivity and confidence of spectral assignment. CAMS utilizes a novel metric, called F-set, that allows accurate identification of the spectra that are similar. A graph theoretic framework is defined that allows the use of F-set metric efficiently for accurate cluster identifications. The accuracy of the algorithm is tested on real HCD and CID data sets with varying amounts of peptides. Our experiments show that the proposed algorithm is able to cluster spectra with very high accuracy in a reasonable amount of time for large spectral data sets. Thus, the algorithm is able to decrease the computational time by compressing the data sets while increasing the throughput of the data by interpreting low S/N spectra.
Keywords
bioinformatics; biological techniques; chemistry computing; chromatography; data analysis; data mining; mass spectroscopic chemical analysis; molecular biophysics; pattern clustering; CAMS algorithm; CID data sets; Clustering Algorithm for Mass Spectra; F-set metric; HCD data sets; LC-MS-MS experiment; biological sample; data redundancy; graph theoretic framework; high throughput spectrometers; large scale mass spectrometry data clustering; liquid chromatography-tandem mass spectrometry; spectral assignment confidence; spectral assignment sensitivity; Accuracy; Bioinformatics; Clustering algorithms; Mass spectroscopy; Peptides; Proteomics; Clustering; Efficient Algorithms; Graph Theory; Mass spectrometry;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference on
Conference_Location
Philadelphia, PA
Print_ISBN
978-1-4673-2559-2
Electronic_ISBN
978-1-4673-2558-5
Type
conf
DOI
10.1109/BIBM.2012.6392738
Filename
6392738
Link To Document