DocumentCode :
485445
Title :
An improved classification method for the common OLE file by N-gram analysis and vector space model
Author :
Hong-Rong Yang ; Ming Xu ; Ning Zheng
Author_Institution :
Inst. of Comput. Applic. Technol., Hangzhou Dianzi Univ., Hangzhou
fYear :
2007
fDate :
12-14 Dec. 2007
Firstpage :
983
Lastpage :
986
Abstract :
Identifying file type by file extension is fallible. Another magic bytes method for these files, which have similar header information, such as the common-used MS Office OLE file, may not distinguish one type from another. In this paper, an efficiently classification method for the common OLE files was proposed. In order to overcome the shortcoming of the original N-gram analysis technique which can not easily tell ambiguous file types apart, the N-gram analysis and the vector space model were combined together to identify the common OLE files. The characteristic items were extracted from the most frequency byte values of each file class, and then the cosine value of two vectors was used to catalogue ambiguous file types. The experiment results demonstrate that our mechanism is effective in identifying the office OLE files, and obtain better performance than the common n-gram method.
Keywords :
file organisation; pattern classification; vectors; MS Office OLE file; N-gram analysis; ambiguous file types cataloguing; classification method; cosine value; file extension; file type identification; magic bytes method; vector space model; N-gram; OLE file; vector space model;
fLanguage :
English
Publisher :
iet
Conference_Titel :
Wireless, Mobile and Sensor Networks, 2007. (CCWMSN07). IET Conference on
Conference_Location :
Shanghai
ISSN :
0537-9989
Print_ISBN :
978-0-86341-836-5
Type :
conf
Filename :
4786369
Link To Document :
بازگشت