DocumentCode :
3444829
Title :
Feature selection based file type identification algorithm
Author :
Cao, Ding ; Luo, Junyong ; Yin, Meijuan ; Yang, Huijie
Author_Institution :
Zhengzhou Inf. Sci. & Technol. Inst., Zhengzhou, China
Volume :
3
fYear :
2010
fDate :
29-31 Oct. 2010
Firstpage :
58
Lastpage :
62
Abstract :
Identifying the true type of an arbitrary file is very important in information security. Methods based on file extensions or magic numbers can be easily spoofed, while a more reliable way is based on analyzing the file´s binary content. We propose an algorithm to generate models for each file type based on analyzing the binary contents of a set of known input files by using n-gram analysis and design a novel feature selection evaluation function for extracting signatures from the models, then using the signatures to recognize the true type of unknown files. Our aim is not to use the structure and key words of any specific file types as this allows the approach to be applied to general file types. Experiments show that the proposed approach is promising especially when the feature selection evaluation function is applied.
Keywords :
file organisation; security of data; arbitrary file; feature selection based file type identification algorithm; feature selection evaluation function; file extensions; files binary content; information security; magic numbers; n-gram analysis; Accuracy; Forensics; Security; Stability analysis; feature selection; file type identification; gram frequency distribution; n-gram analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Computing and Intelligent Systems (ICIS), 2010 IEEE International Conference on
Conference_Location :
Xiamen
Print_ISBN :
978-1-4244-6582-8
Type :
conf
DOI :
10.1109/ICICISYS.2010.5658559
Filename :
5658559
Link To Document :
بازگشت