• DocumentCode
    595569
  • Title

    A scalable search index for binary files

  • Author

    Jin, Weiwei ; Hines, C. ; Cohen, C. ; Narasimhan, Priya

  • Author_Institution
    CMU, Pittsburgh, PA, USA
  • fYear
    2012
  • fDate
    16-18 Oct. 2012
  • Firstpage
    94
  • Lastpage
    103
  • Abstract
    The ability to locate specific byte-sequences in large collections of binary files is important in many applications, especially malware analysis. However, it can be a time consuming process. Researchers and analysts, such as those at CERT, often have to search terabytes of data for characteristic patterns and signatures, which can take upwards of days to complete. Although many search systems, designed specifically to expedite text and metadata queries, exist, these tools are unsuitable for searching files containing arbitrary bytes. By using probabilistic techniques to pre-filter likely search candidates, we present a scalable architecture for searching and indexing terabyte-size collections of binary files. Our implementation performs searches in minutes that would required days to complete using iterative techniques. It also reduces storage costs by balancing the amount of data indexed with the total time required to conduct and verify a query.
  • Keywords
    indexing; information filtering; invasive software; meta data; probability; query processing; text analysis; CERT; binary files; byte-sequences; characteristic patterns; iterative techniques; malware analysis; metadata query; probabilistic techniques; scalable search index; search candidate prefiltering; signatures; text query; Bioinformatics; Data structures; Encoding; Indexes; Malware; Open source software;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Malicious and Unwanted Software (MALWARE), 2012 7th International Conference on
  • Conference_Location
    Fajardo, PR
  • Print_ISBN
    978-1-4673-4880-5
  • Type

    conf

  • DOI
    10.1109/MALWARE.2012.6461014
  • Filename
    6461014