DocumentCode
3399059
Title
Link farm detection using SVMLight tool
Author
Saraswathi, D. ; Kathiravan, A. Vijaya ; Kavitha, R.
Author_Institution
K.S. Rangasamy Coll. of Arts & Sci., Namakkal, India
fYear
2012
fDate
10-12 Jan. 2012
Firstpage
1
Lastpage
5
Abstract
Search Engine spam is a web page or a portion of a web page which has been created with the intention of increasing its ranking in search engines. Web spamming refers to actions intended to mislead search engines and give some pages higher ranking than they deserve. Anyone who uses a search engine frequently has most likely encountered a high ranking page that consists of nothing more than a bunch of query keywords. These pages detract both from the user experience and from the quality of the search engine. Search engine spam is a webpage that has been designed to artificially inflating its search engine ranking. Recently this search engine spam has been increased dramatically and creates problem to the search engine and the web surfer. It degrades the search engine´s results, occupies more memory and consumes more time for creating indexes, and frustrates the user by giving irrelevant results. Search engines have tried many techniques to filter out these spam pages before they can appear on the query results page. In this paper, various ways of creating spam pages, a collection of current methods that are being used to detect spam, and a new approach to build a tool for link spam detection that uses machine learning as a means for detecting spam. This new approach uses SVMLight tool to detect the link spam which only considers the link structure of Web, regardless of page contents. These statistical features are used to build a classifier that is tested over a large collection of Web link spam. The link farm can identify based on degree Hub and Authorities of link. The spam classifier makes use of the Wordnet word database and SVMLight tool to classify web links as either spam or not spam. These features are not only related to quantitative data extracted from the Web pages, but also to qualitative properties, mainly of the page links.
Keywords
Web sites; learning (artificial intelligence); pattern classification; search engines; security of data; support vector machines; SVMLight tool; Web link spam detection; Web link structure; Web page; Web spamming; Web surfer; Wordnet word database; hub degree; link authority; link farm detection; machine learning; search engine ranking; search engine spam; spam classifier; statistical feature; Conferences; Crawlers; Informatics; Search engines; Unsolicited electronic mail; Web pages; Classification; Click Spam; Cloaking; Link Farm; PageRank; Search engine; Spamdexing;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Communication and Informatics (ICCCI), 2012 International Conference on
Conference_Location
Coimbatore
Print_ISBN
978-1-4577-1580-8
Type
conf
DOI
10.1109/ICCCI.2012.6158833
Filename
6158833
Link To Document