Language specific crawling based on web pages features

Author

Azimzadeh, Masomeh ; Yari, Alireza ; Kargar, Mohammad Javad

Author_Institution

Iran Telecommun. Res. Center, Tehran, Iran

fYear

2010

fDate

2-4 March 2010

Firstpage

Lastpage

Abstract

Since Word Wide Web contains large set of data in different languages, retrieving language specific information creates a new challenge in information retrieval called language specific crawling. In this paper, a new approach is purposed for language specific crawling in which a combination of some selected content and context features of web documents have been applied. This approach has been implemented for Persian language and evaluated in Iranian web domain. The evaluation results show how this approach can improve the performance of crawling from speed and coverage points of view.

Keywords

Internet; document handling; information retrieval; Iranian Web domain; Persian language; Web documents; Web pages features; Word Wide Web; information retrieval; language specific crawling; Bandwidth; Crawlers; Data mining; Information resources; Information retrieval; Java; Ontologies; Testing; Thesauri; Web pages;

fLanguage

English

Publisher

ieee

Conference_Titel

Multimedia Computing and Information Technology (MCIT), 2010 International Conference on

Conference_Location

Sharjah

Print_ISBN

978-1-4244-7001-3

Type

conf

DOI

10.1109/MCIT.2010.5444865

Filename

5444865

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2036239