• DocumentCode
    680079
  • Title

    Max-hashing fragments for large data sets detection

  • Author

    David, Jean Pierre

  • Author_Institution
    Ecole Polytech. de Montreal, Montreal, QC, Canada
  • fYear
    2013
  • fDate
    9-11 Dec. 2013
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    The standard way to detect known digital objects inside a stream of bytes consists in using a string matching algorithm initialized with a dictionary containing the objects to detect. Depending on the application, the algorithm may be implemented in software or with dedicated hardware, to speedup the processing. Nevertheless, such approach requires an automaton with a complexity that is linear in the size of the dictionary. Large dictionaries result in large automatons that must be stored in high-latency memories, therefore limiting the processing speed. We propose a fast algorithm tailored to FPGA implementation to detect the transfer of known digital objects from their fragments. For illustrative purpose, the algorithm is applied to the detection of more than 100 000 known JPEG files by just inspecting the IP packets captured during an FTP transfer. Results demonstrate excellent true/false positive rates with nearly no limit on the number of objects to detect and the transfer rates.
  • Keywords
    IP networks; dictionaries; field programmable gate arrays; object detection; protocols; storage management; storage management chips; string matching; FPGA; FTP transfer; IP packet inspection; JPEG files detection; SMA; bytes streaming; data sets detection; dictionary; digital object detection; digital object transfer; high latency memory; max hashing fragments; string matching algorithm; Automata; Computer architecture; Data mining; Dictionaries; Entropy; Field programmable gate arrays; Transform coding;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Reconfigurable Computing and FPGAs (ReConFig), 2013 International Conference on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4799-2078-5
  • Type

    conf

  • DOI
    10.1109/ReConFig.2013.6732307
  • Filename
    6732307