DocumentCode :
1838423
Title :
TSearch: A Self-learning Vertical Search Spider for Travel
Author :
Li, Suke ; Chen, Zhong ; Tang, Liyong ; Wang, Zhao
Author_Institution :
Sch. of Electron. Eng. & Comput. Sci., Peking Univ., Beijing
fYear :
2008
fDate :
18-21 Nov. 2008
Firstpage :
348
Lastpage :
353
Abstract :
A self-learning vertical search spider for travel is presented. This paper focuses on two machine learning methods SNBC (self-learning naive Bayes classifier) and LQNBC (log quotient naive Bayes classifier) for improving search quality and topic relevance. A framework of designing and implementing a vertical spider TSearch with basic general search spider architecture and functions is also showed. TSearch uses SNBC to filter HTML pages and relies on LQNBC to detect unknown travel related Web sites with high precision. The recall and the precision for the classification of texts crawled by TSearch were measured experimentally. These experiments indicate that using LQNBC and SNBC, TSearch can produce promising travel related information for search.
Keywords :
Bayes methods; Web sites; pattern classification; search engines; travel industry; unsupervised learning; HTML pages; LQNBC; SNBC; TSearch; log quotient naive Bayes classifier; machine learning methods; self-learning naive Bayes classifier; self-learning vertical search spider; travel related Web sites; Data mining; Databases; Detectors; HTML; Information filtering; Information filters; Learning systems; Search engines; Uniform resource locators; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Young Computer Scientists, 2008. ICYCS 2008. The 9th International Conference for
Conference_Location :
Hunan
Print_ISBN :
978-0-7695-3398-8
Electronic_ISBN :
978-0-7695-3398-8
Type :
conf
DOI :
10.1109/ICYCS.2008.338
Filename :
4708998
Link To Document :
بازگشت