DocumentCode
2036239
Title
Language specific crawling based on web pages features
Author
Azimzadeh, Masomeh ; Yari, Alireza ; Kargar, Mohammad Javad
Author_Institution
Iran Telecommun. Res. Center, Tehran, Iran
fYear
2010
fDate
2-4 March 2010
Firstpage
17
Lastpage
20
Abstract
Since Word Wide Web contains large set of data in different languages, retrieving language specific information creates a new challenge in information retrieval called language specific crawling. In this paper, a new approach is purposed for language specific crawling in which a combination of some selected content and context features of web documents have been applied. This approach has been implemented for Persian language and evaluated in Iranian web domain. The evaluation results show how this approach can improve the performance of crawling from speed and coverage points of view.
Keywords
Internet; document handling; information retrieval; Iranian Web domain; Persian language; Web documents; Web pages features; Word Wide Web; information retrieval; language specific crawling; Bandwidth; Crawlers; Data mining; Information resources; Information retrieval; Java; Ontologies; Testing; Thesauri; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Multimedia Computing and Information Technology (MCIT), 2010 International Conference on
Conference_Location
Sharjah
Print_ISBN
978-1-4244-7001-3
Type
conf
DOI
10.1109/MCIT.2010.5444865
Filename
5444865
Link To Document