DocumentCode
3104030
Title
Increasing the Efficiency of Data Storage and Analysis Using Indexed Compression
Author
Beagley, Nathaniel ; Scherrer, Chad ; Shi, Yan ; Clowers, Brian H. ; Danielson, William F. ; Shah, Anuj R.
Author_Institution
Comput. Math. Group, Pacific Northwest Nat. Lab., Richland, WA, USA
fYear
2009
fDate
9-11 Dec. 2009
Firstpage
66
Lastpage
71
Abstract
The massive data sets produced by the high- throughput, multidimensional mass spectrometry instruments used in proteomics create challenges in data acquisition, storage and analysis. Data compression can help mitigate some of these problems but at the cost of less efficient data access, which directly impacts the computational time of data analysis. We have developed a compression methodology that 1) is optimized for a targeted mass spectrometry proteomics data set and 2) provides the benefits of size and speed from compression while increasing analysis efficiency by allowing extraction of segments of uncompressed data from a file without having to uncompress the entire file. This paper describes our compression algorithm, presents comparative metrics of compression size and speed, and explores approaches for applying the algorithm to a generalized data set.
Keywords
biology computing; data analysis; data compression; proteins; data analysis; data storage; indexed compression; multidimensional mass spectrometry; proteomics; Costs; Data acquisition; Data analysis; Data compression; Instruments; Mass spectroscopy; Memory; Multidimensional systems; Proteomics; Throughput;
fLanguage
English
Publisher
ieee
Conference_Titel
e-Science, 2009. e-Science '09. Fifth IEEE International Conference on
Conference_Location
Oxford
Print_ISBN
978-0-7695-3877-8
Type
conf
DOI
10.1109/e-Science.2009.18
Filename
5380883
Link To Document