Title :
A Study on IT-security Vocabulary for Domain Document Classification
Author :
Liping, Qian ; Lidong, Wang
Author_Institution :
Sch. of Inf., Renmin Univ. of China, Beijing, China
Abstract :
The volume of published scientific literature available on Internet has been increasing exponentially. Some of them reflect the latest achievement of the specific research domain. In recent years, many projects have been funded aiming to online scientific literature mining, especially in biomedical research. Scientific literature covers most of the hot topics in the research field and has a very large domain-specific vocabulary. The exploitation of domain knowledge and specialized vocabulary can dramatically improve the result of literature text processing. In this paper, we build a large-scale annotated corpus from abstract content in IT security literature, from which we construct domain vocabulary with TF/IDF-like schema and present quantitative analysis on the difference between features constructed respectively from positive and negative - annotated corpus. We evaluated the effects of the vocabulary by document similarity computing and classifying. The experimental result shows that domain vocabulary can improve the accuracy effectively.
Keywords :
Internet; document handling; information technology; literature; pattern classification; security of data; text analysis; vocabulary; IT-security literature; IT-security vocabulary; Internet; TF-IDF-like schema; abstract content; biomedical research; document similarity computing; domain document classification; domain knowledge exploitation; large scale annotated corpus; literature text processing; online scientific literature mining; published scientific literature; quantitative analysis; very large domain-specific vocabulary; Indexes; Internet; Security; Semantics; Text categorization; Text processing; Vocabulary; IT security; TF/IDF; document classification; vocabulary;
Conference_Titel :
Computational Intelligence and Security (CIS), 2011 Seventh International Conference on
Conference_Location :
Hainan
Print_ISBN :
978-1-4577-2008-6
DOI :
10.1109/CIS.2011.121