• DocumentCode
    2236346
  • Title

    Exploring I/O Strategies for Parallel Sequence-Search Tools with S3aSim

  • Author

    Ching, Avery ; Feng, Wu-chun ; Lin, Heshan ; Ma, Xiaosong ; Choudhary, Alok

  • Author_Institution
    Dept. of Electr. Eng. & Comput. Sci., Northwestern Univ., Evanston, IL
  • fYear
    0
  • fDate
    0-0 0
  • Firstpage
    229
  • Lastpage
    240
  • Abstract
    Parallel sequence-search tools are rising in popularity among computational biologists. With the rapid growth of sequence databases, database segmentation is the trend of the future for such search tools. While I/O currently is not a significant bottleneck for parallel sequence-search tools, future technologies including faster processors, customized computational hardware such as FPGAs, improved search algorithms, and exponentially growing databases emphasize an increasing need for efficient parallel I/O in future parallel sequence-search tools. Our paper focuses on examining different I/O strategies for these future tools in a modern parallel file system (PVFS2). Because implementing and comparing various I/O algorithms in every search tool is labor-intensive and time-consuming, we introduce S3aSim, a general simulation framework for sequence-search which allows us to quickly implement, test, and profile various I/O strategies. We examine a variety of I/O strategies (e.g., master-writing and various worker-writing strategies using individual and collective I/O methods) for storing result data in sequence-search tools such as mpiBLAST, pioBLAST, and parallel HMMer. Our experiments fully detail the interaction of computing and I/O within a full application simulation as opposed to typical I/O-only benchmarks
  • Keywords
    biology computing; database management systems; parallel processing; search problems; S3aSim framework; computational biologist; master-writing strategy; mpiBLAST tool; parallel HMMer tool; parallel I/O strategy; parallel file system; parallel sequence-search tool; pioBLAST tool; search algorithm; sequence database segmentation; worker-writing strategy; Biology computing; Computer science; Concurrent computing; DNA; Databases; Field programmable gate arrays; File systems; Hardware; Hidden Markov models; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Distributed Computing, 2006 15th IEEE International Symposium on
  • Conference_Location
    Paris
  • ISSN
    1082-8907
  • Print_ISBN
    1-4244-0307-3
  • Type

    conf

  • DOI
    10.1109/HPDC.2006.1652154
  • Filename
    1652154