• DocumentCode
    2957426
  • Title

    Generating New Features Using Genetic Programming to Detect Link Spam

  • Author

    Shengen, Li ; Xiaofei, Niu ; Peiqi, Li ; Lin, Wang

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Shandong Jianzhu Univ., Jinan, China
  • Volume
    1
  • fYear
    2011
  • fDate
    28-29 March 2011
  • Firstpage
    135
  • Lastpage
    138
  • Abstract
    Link spam techniques can enable some pages to achieve higher-than-deserved rankings in the results of a search engine. They negatively affect the quality of search results. Classification methods can detect link spam. For classification problem, features play an important role. This paper proposes to derive new features using genetic programming from existing link-based features and use the new features as the inputs to SVM and GP classifiers for the identification of link spam. Experiments on WEBSPAM-UK2006 show that the classification results of the classifiers that use 10 newly generated features are much better than those of the classifiers that use original 41 link-based features and equivalent to those of the classifiers that use 138 transformed link-based features. The newly generated features can improve the link spam classification performance.
  • Keywords
    Internet; feature extraction; genetic algorithms; information retrieval; pattern classification; search engines; support vector machines; GP classifier; SVM; WEBSPAM-UK2006; classification method; genetic programming; link spam detection; link-based feature generation; search engine; search result quality; Accuracy; Binary trees; Feature extraction; Genetic programming; Support vector machines; Unsolicited electronic mail; Web pages; Feature Generation; Genetic Programming; Link Spam;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Computation Technology and Automation (ICICTA), 2011 International Conference on
  • Conference_Location
    Shenzhen, Guangdong
  • Print_ISBN
    978-1-61284-289-9
  • Type

    conf

  • DOI
    10.1109/ICICTA.2011.41
  • Filename
    5750574