Title :
TSearch: A Self-learning Vertical Search Spider for Travel
Author :
Li, Suke ; Chen, Zhong ; Tang, Liyong ; Wang, Zhao
Author_Institution :
Sch. of Electron. Eng. & Comput. Sci., Peking Univ., Beijing
Abstract :
A self-learning vertical search spider for travel is presented. This paper focuses on two machine learning methods SNBC (self-learning naive Bayes classifier) and LQNBC (log quotient naive Bayes classifier) for improving search quality and topic relevance. A framework of designing and implementing a vertical spider TSearch with basic general search spider architecture and functions is also showed. TSearch uses SNBC to filter HTML pages and relies on LQNBC to detect unknown travel related Web sites with high precision. The recall and the precision for the classification of texts crawled by TSearch were measured experimentally. These experiments indicate that using LQNBC and SNBC, TSearch can produce promising travel related information for search.
Keywords :
Bayes methods; Web sites; pattern classification; search engines; travel industry; unsupervised learning; HTML pages; LQNBC; SNBC; TSearch; log quotient naive Bayes classifier; machine learning methods; self-learning naive Bayes classifier; self-learning vertical search spider; travel related Web sites; Data mining; Databases; Detectors; HTML; Information filtering; Information filters; Learning systems; Search engines; Uniform resource locators; Web pages;
Conference_Titel :
Young Computer Scientists, 2008. ICYCS 2008. The 9th International Conference for
Conference_Location :
Hunan
Print_ISBN :
978-0-7695-3398-8
Electronic_ISBN :
978-0-7695-3398-8
DOI :
10.1109/ICYCS.2008.338