Title :
Features extraction for illicit web pages identifications using independent component analysis
Author :
Sam, Lee Zhi ; Bin Maarof, Mohd Aizaini ; Selamat, Ali ; Shamsuddin, Siti Mariyam
Author_Institution :
Fac. of Comput. Sci. & Inf. Syst., Univ. Teknol. Malaysia, Sukdai
Abstract :
The illicit Web content such as pornography, violence, gambling, etc. have greatly polluted the mind of immature web users. Pornography perhaps is one of the biggest threats related to current childrenpsilas and teenagerspsila healthy mental life. A proper way to identify illicit web pages efficiently is highly desired. In this paper, we analyze the textual content of web pages such as pornography, gynecology, sex education and general business news using independent component analysis (ICA) algorithm. We establish three similar models which are principal component analysis (PCA) model, ICA model and PCA-ICA model as comparison. We evaluate the effectiveness of these proposed models using information retrieval measurement such as precision, recall, F1 and accuracy. Our experiment result shown that PCA and PCA-ICA models are capable to identify illicit web pages correctly with overall performance above than 90%. The idea of this research would give researchers an insight into textual content-based for web pages categorization.
Keywords :
Internet; feature extraction; independent component analysis; information retrieval; mathematics computing; features extraction; illicit Web pages identifications; independent component analysis; information retrieval measurement; pornography; principal component analysis; Feature extraction; Independent component analysis; Information filtering; Information filters; Internet; Machine learning; Pediatrics; Principal component analysis; Uniform resource locators; Web pages; artificial neural network; illicit web pages identification; independent component analysis; principal component analysis; textual content analysis;
Conference_Titel :
Intelligent and Advanced Systems, 2007. ICIAS 2007. International Conference on
Conference_Location :
Kuala Lumpur
Print_ISBN :
978-1-4244-1355-3
Electronic_ISBN :
978-1-4244-1356-0
DOI :
10.1109/ICIAS.2007.4658363