DocumentCode :
2897982
Title :
Collaborative Web crawling: information gathering/processing over Internet
Author :
Shang-Hua Teng ; Qi Lu ; Eichstaedt, M. ; Ford, D. ; Lehman, T.
Author_Institution :
Dept. of Comput. Sci., Illinois Univ., Urbana, IL, USA
Volume :
Track5
fYear :
1999
fDate :
5-8 Jan. 1999
Abstract :
The main objective of the IBM Grand Central Station (GCS) project is to gather all types of information in any format (text, data, image, graphics, audio, video) from cyberspace, to process/index/summarize the information, and to push the right information to the right people. Because of the very large scale of cyberspace, parallel processing in both crawling/gathering and information processing is indispensable. We present a scalable method for collaborative Web crawling and information processing. The method includes an automatic cyberspace partitioner which is designed to balance and re-balance the load dynamically among processors. It can be used when all Web crawlers are located on a tightly coupled high-performance system as well as when they are scattered in a distributed environment. We implemented these algorithms in Java.
Keywords :
Internet; Java; information resources; information retrieval; resource allocation; IBM Grand Central Station project; Internet; Java; collaborative Web crawling; cyberspace; distributed environment; high-performance system; indexing; information gathering; information processing; load balancing; parallel processing; Collaboration; Internet;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems Sciences, 1999. HICSS-32. Proceedings of the 32nd Annual Hawaii International Conference on
Conference_Location :
Maui, HI, USA
Print_ISBN :
0-7695-0001-3
Type :
conf
DOI :
10.1109/HICSS.1999.772945
Filename :
772945
Link To Document :
بازگشت