DocumentCode :
1777012
Title :
A focused linked data crawler based on HTML link analysis
Author :
Emamdadi, Reihaneh ; Kahani, Mohsen ; Zarrinkalam, Fattane
Author_Institution :
Dept. of Comput. Eng., Ferdowsi Univ. of Mashhad, Mashhad, Iran
fYear :
2014
fDate :
29-30 Oct. 2014
Firstpage :
74
Lastpage :
79
Abstract :
Linked Data can be published as RDF documents or embedded in HTML documents. A linked data crawler is a program that discovers the published linked data from the web by following RDF links. Note that there are RDF documents that are surrounded by HTML documents. Therefore, linked data crawlers require to follow HTML links in addition to RDF links to be able to discover such RDF documents as well as harvest the embedded linked data in HTML documents. However, many HTML documents have not embedded any linked data and not pointed to any RDF documents. So, crawling such HTML documents decreases discovery rate of RDF documents per unit of network bandwidth and wastes computation resources on non-RDF documents. In this paper, a focused linked data crawler is proposed to address this problem. The proposed crawler analyzes and prioritizes HTML links by calculating the possibility that a link will lead to an RDF document. The experimental evaluation shows that the proposed approach is effective in terms of increasing discovery rate of RDF document in comparison with a non-focused linked data crawler.
Keywords :
hypermedia markup languages; search engines; HTML document crawling; HTML link analysis; HTML link prioritization; RDF document discovery rate; RDF links; computation resource wastage; embedded linked data; focused linked data crawler; network bandwidth; nonRDF documents; published linked data discovery; Bandwidth; Crawlers; Data mining; HTML; Measurement; Resource description framework; Search engines; HTML link; RDF link; discovery rate; focused crawler; linked data crawler;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Knowledge Engineering (ICCKE), 2014 4th International eConference on
Conference_Location :
Mashhad
Print_ISBN :
978-1-4799-5486-5
Type :
conf
DOI :
10.1109/ICCKE.2014.6993406
Filename :
6993406
Link To Document :
بازگشت