• DocumentCode
    2036239
  • Title

    Language specific crawling based on web pages features

  • Author

    Azimzadeh, Masomeh ; Yari, Alireza ; Kargar, Mohammad Javad

  • Author_Institution
    Iran Telecommun. Res. Center, Tehran, Iran
  • fYear
    2010
  • fDate
    2-4 March 2010
  • Firstpage
    17
  • Lastpage
    20
  • Abstract
    Since Word Wide Web contains large set of data in different languages, retrieving language specific information creates a new challenge in information retrieval called language specific crawling. In this paper, a new approach is purposed for language specific crawling in which a combination of some selected content and context features of web documents have been applied. This approach has been implemented for Persian language and evaluated in Iranian web domain. The evaluation results show how this approach can improve the performance of crawling from speed and coverage points of view.
  • Keywords
    Internet; document handling; information retrieval; Iranian Web domain; Persian language; Web documents; Web pages features; Word Wide Web; information retrieval; language specific crawling; Bandwidth; Crawlers; Data mining; Information resources; Information retrieval; Java; Ontologies; Testing; Thesauri; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multimedia Computing and Information Technology (MCIT), 2010 International Conference on
  • Conference_Location
    Sharjah
  • Print_ISBN
    978-1-4244-7001-3
  • Type

    conf

  • DOI
    10.1109/MCIT.2010.5444865
  • Filename
    5444865