Title :
Max-hashing fragments for large data sets detection
Author :
David, Jean Pierre
Author_Institution :
Ecole Polytech. de Montreal, Montreal, QC, Canada
Abstract :
The standard way to detect known digital objects inside a stream of bytes consists in using a string matching algorithm initialized with a dictionary containing the objects to detect. Depending on the application, the algorithm may be implemented in software or with dedicated hardware, to speedup the processing. Nevertheless, such approach requires an automaton with a complexity that is linear in the size of the dictionary. Large dictionaries result in large automatons that must be stored in high-latency memories, therefore limiting the processing speed. We propose a fast algorithm tailored to FPGA implementation to detect the transfer of known digital objects from their fragments. For illustrative purpose, the algorithm is applied to the detection of more than 100 000 known JPEG files by just inspecting the IP packets captured during an FTP transfer. Results demonstrate excellent true/false positive rates with nearly no limit on the number of objects to detect and the transfer rates.
Keywords :
IP networks; dictionaries; field programmable gate arrays; object detection; protocols; storage management; storage management chips; string matching; FPGA; FTP transfer; IP packet inspection; JPEG files detection; SMA; bytes streaming; data sets detection; dictionary; digital object detection; digital object transfer; high latency memory; max hashing fragments; string matching algorithm; Automata; Computer architecture; Data mining; Dictionaries; Entropy; Field programmable gate arrays; Transform coding;
Conference_Titel :
Reconfigurable Computing and FPGAs (ReConFig), 2013 International Conference on
Conference_Location :
Cancun
Print_ISBN :
978-1-4799-2078-5
DOI :
10.1109/ReConFig.2013.6732307