Title :
A SVM-based compound-word recognition method in information security
Author :
Shixian Li ; Lei Zhang ; Bo Han ; Tingrui Lei ; Qing Wang ; Tao Peng ; Peng Cao
Author_Institution :
Security Evaluation Center, China Inf. Technol., Beijing, China
Abstract :
With the emergence of mobile Internet, Internet of things and cloud computing, the domain of information security is in a rapid development. As a result, a constant stream of compound-words describing new concepts and new technologies has arisen. However, the existing dictionary does not collect those new compound-words in time, so it cannot identify them correctly. In order to solve this problem, this paper presents a SVM-based compound-word recognition method in information security. The method is based on the outputs of the existing word segmentation system. It constructs adjacent atom-word digraph according to the statistical co-occurrence features and lexical rules. Next, it produces compound-word candidate set through deep traverse the digraph by the longest match principle. It further filters the candidate set by using a SVM classifier with the help of domain contrast corpus and computer dictionary. We use this method to identify new compound-words from 2200 vulnerability description texts. It achieves a precision of 82.25% and recall of 77.44%. The results show that our method is able to effectively identify new compound-words in information security from large scale of corpus.
Keywords :
directed graphs; pattern classification; security of data; statistical analysis; support vector machines; word processing; Internet of things; SVM classifier; SVM-based compound-word recognition method; atom-word digraph; cloud computing; computer dictionary; domain contrast corpus; information security; lexical rules; longest match principle; mobile Internet; statistical co-occurrence features; support vector machines; vulnerability description texts; word segmentation system; Compounds; Computers; Dictionaries; Feature extraction; Filtering; Information security; Support vector machines; SVM; compound-word; depth first traversal; domain contrast corpus;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2013 10th International Conference on
Conference_Location :
Shenyang
DOI :
10.1109/FSKD.2013.6816310