Title :
A survey in semantic web technologies-inspired focused crawlers
Author :
Dong, Hai ; Hussain, Farookh Khadeer ; Chang, Elizabeth
Author_Institution :
Digital Ecosyst. & Bus. Intell. Inst., Curtin Univ. of Technol., Bentley, WA
Abstract :
Crawlers are software which can traverse the Internet and retrieve Webpages by hyperlinks. In the face of the inundant spam Websites, traditional Web crawlers cannot function well to solve this problem. Semantic focused crawlers utilize semantic web technologies to analyze the semantics of hyperlinks and Web documents. This paper briefly reviews the recent studies on one category of semantic focused crawlers - ontology-based focused crawlers, which are a series of crawlers that utilize ontologies to link the fetched Web documents with the ontological concepts (topics). The purpose of this is to organize and categorize Web documents, or filtering irrelevant Webpages with regards to the topics. A brief comparison are made among these crawlers,from six perspectives - domain, working environment, special functions, technologies utilized, evaluation metrics and evaluation results. The conclusion with respect to this comparison is made in the final section.
Keywords :
Web sites; information retrieval; ontologies (artificial intelligence); search engines; semantic Web; Internet; Web document categorization; Web page retrieval; hyperlink; ontology-based focused crawler; search engines; semantic Web technology; semantic focused Web crawler; spam Web site; Clustering algorithms; Crawlers; Ecosystems; Information filtering; Information filters; Joining processes; Ontologies; Search engines; Semantic Web; Uniform resource locators;
Conference_Titel :
Digital Information Management, 2008. ICDIM 2008. Third International Conference on
Conference_Location :
London
Print_ISBN :
978-1-4244-2916-5
Electronic_ISBN :
978-1-4244-2917-2
DOI :
10.1109/ICDIM.2008.4746736