• DocumentCode
    3104030
  • Title

    Increasing the Efficiency of Data Storage and Analysis Using Indexed Compression

  • Author

    Beagley, Nathaniel ; Scherrer, Chad ; Shi, Yan ; Clowers, Brian H. ; Danielson, William F. ; Shah, Anuj R.

  • Author_Institution
    Comput. Math. Group, Pacific Northwest Nat. Lab., Richland, WA, USA
  • fYear
    2009
  • fDate
    9-11 Dec. 2009
  • Firstpage
    66
  • Lastpage
    71
  • Abstract
    The massive data sets produced by the high- throughput, multidimensional mass spectrometry instruments used in proteomics create challenges in data acquisition, storage and analysis. Data compression can help mitigate some of these problems but at the cost of less efficient data access, which directly impacts the computational time of data analysis. We have developed a compression methodology that 1) is optimized for a targeted mass spectrometry proteomics data set and 2) provides the benefits of size and speed from compression while increasing analysis efficiency by allowing extraction of segments of uncompressed data from a file without having to uncompress the entire file. This paper describes our compression algorithm, presents comparative metrics of compression size and speed, and explores approaches for applying the algorithm to a generalized data set.
  • Keywords
    biology computing; data analysis; data compression; proteins; data analysis; data storage; indexed compression; multidimensional mass spectrometry; proteomics; Costs; Data acquisition; Data analysis; Data compression; Instruments; Mass spectroscopy; Memory; Multidimensional systems; Proteomics; Throughput;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    e-Science, 2009. e-Science '09. Fifth IEEE International Conference on
  • Conference_Location
    Oxford
  • Print_ISBN
    978-0-7695-3877-8
  • Type

    conf

  • DOI
    10.1109/e-Science.2009.18
  • Filename
    5380883