DocumentCode
680079
Title
Max-hashing fragments for large data sets detection
Author
David, Jean Pierre
Author_Institution
Ecole Polytech. de Montreal, Montreal, QC, Canada
fYear
2013
fDate
9-11 Dec. 2013
Firstpage
1
Lastpage
6
Abstract
The standard way to detect known digital objects inside a stream of bytes consists in using a string matching algorithm initialized with a dictionary containing the objects to detect. Depending on the application, the algorithm may be implemented in software or with dedicated hardware, to speedup the processing. Nevertheless, such approach requires an automaton with a complexity that is linear in the size of the dictionary. Large dictionaries result in large automatons that must be stored in high-latency memories, therefore limiting the processing speed. We propose a fast algorithm tailored to FPGA implementation to detect the transfer of known digital objects from their fragments. For illustrative purpose, the algorithm is applied to the detection of more than 100 000 known JPEG files by just inspecting the IP packets captured during an FTP transfer. Results demonstrate excellent true/false positive rates with nearly no limit on the number of objects to detect and the transfer rates.
Keywords
IP networks; dictionaries; field programmable gate arrays; object detection; protocols; storage management; storage management chips; string matching; FPGA; FTP transfer; IP packet inspection; JPEG files detection; SMA; bytes streaming; data sets detection; dictionary; digital object detection; digital object transfer; high latency memory; max hashing fragments; string matching algorithm; Automata; Computer architecture; Data mining; Dictionaries; Entropy; Field programmable gate arrays; Transform coding;
fLanguage
English
Publisher
ieee
Conference_Titel
Reconfigurable Computing and FPGAs (ReConFig), 2013 International Conference on
Conference_Location
Cancun
Print_ISBN
978-1-4799-2078-5
Type
conf
DOI
10.1109/ReConFig.2013.6732307
Filename
6732307
Link To Document