DocumentCode :
3115807
Title :
Content based file type detection algorithms
Author :
McDaniel, Mason ; Heydari, M. Hossain
Author_Institution :
Dept. of Comput. Sci., James Madison Univ., Harrisonburg, VA, USA
fYear :
2003
fDate :
6-9 Jan. 2003
Abstract :
Identifying the true type of a computer file can be a difficult problem. Previous methods of file type recognition include fixed file extensions, fixed "magic numbers" stored with the files, and proprietary descriptive file wrappers. All of these methods have significant limitations. This paper proposes algorithms for automatically generating "fingerprints" of file types based on a set of known input files, then using the fingerprints to recognize the true type of unknown files based on their content, rather than metadata associated with them. Recognition is performed by three different algorithms based on: byte frequency analysis, byte frequency cross-correlation analysis, and file header/trailer analysis. Tests were run to measure the accuracy of these algorithms. The accuracy varied from 23% to 96% depending upon which algorithm was used. These algorithms could be used by virus scanning packages, firewalls, intrusion detection systems, forensic analyses of computer hard drives, Web browsers, or any other program that needs to identify the types of files for proper operation. File type detection is also important to the operating systems for correct identification and handling of files regardless of file extension.
Keywords :
authorisation; file organisation; online front-ends; operating systems (computers); Web browsers; byte frequency analysis; byte frequency cross-correlation analysis; computer file; computer hard drives; content based file type detection algorithms; file extension; file handling; file header analysis; file trailer analysis; file true type; file type fingerprints; file type recognition; firewalls; fixed file extensions; fixed magic numbers; forensic analysis; intrusion detection systems; metadata; operating systems; proprietary descriptive file wrappers; software packages; virus scan; Algorithm design and analysis; Detection algorithms; Drives; Fingerprint recognition; Forensics; Frequency; Intrusion detection; Packaging; Performance analysis; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
System Sciences, 2003. Proceedings of the 36th Annual Hawaii International Conference on
Print_ISBN :
0-7695-1874-5
Type :
conf
DOI :
10.1109/HICSS.2003.1174905
Filename :
1174905
Link To Document :
بازگشت