Title :
URL classification using non negative matrix factorization
Author :
Khare, Shreya ; Bhandari, Akshay ; Murthy, Hema A.
Author_Institution :
Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Madras, Chennai, India
fDate :
Feb. 28 2014-March 2 2014
Abstract :
Internet availability on a campus is not metered. Internet link bandwidths are vulnerable as they can be misused. Moreover, websites blacklist campuses for misuse. Especially blacklisting by academic websites like IEEE and ACM can lead to serious researchers being denied access to information. The objective of this paper is to proactively classify anomalous accesses. This will enable campus ISPs to deny access to users, misusing the Internet. In particular URLs are classified using the short snippets(meta-data) that are available. New Features, namely random walk term weights, within class popularity in tandem with non negative matrix factorization show a lot of promise for classifying URLs. The classification accuracy is as a high as 92.96% on 10 gigabytes of proxy data.
Keywords :
Internet; Web sites; matrix decomposition; pattern classification; Internet availability; URL classification; academic websites; class popularity; nonnegative matrix factorization; random walk term weights; Bandwidth; Feature extraction; Heating; Internet; Matrix decomposition; Vectors; Web pages; Non Negative Matrix Factorization; Web Page Classification;
Conference_Titel :
Communications (NCC), 2014 Twentieth National Conference on
Conference_Location :
Kanpur
DOI :
10.1109/NCC.2014.6811274