Title :
Finding Thai Web Pages in Foreign Web Spaces
Author :
Somboonviwat, Kulwadee ; Tamura, Takayuki ; Kitsuregawa, Masaru
Author_Institution :
The University of Tokyo, Japan
Abstract :
This paper proposes language specific web crawling (LSWC) as a method of creating large-scale language specific Web archives for countries with linguistic identities such as Thailand. The LSWC strategy for selectively gathering Thai web pages from virtually anywhere on the Web is derived based on the results of static analyses of the Thai Web graph. We evaluated the performance of the LSWC strategy using a web crawling simulator.
Keywords :
Buildings; Crawlers; Information technology; Large-scale systems; Libraries; Research and development; Space technology; Uniform resource locators; Web pages; Web server;
Conference_Titel :
Data Engineering Workshops, 2006. Proceedings. 22nd International Conference on
Conference_Location :
Atlanta, GA, USA
Print_ISBN :
0-7695-2571-7
DOI :
10.1109/ICDEW.2006.60