• DocumentCode
    264879
  • Title

    Design of improved focused web crawler by analyzing semantic nature of URL and anchor text

  • Author

    Dahiwale, Prashant ; Raghuwanshi, M.M. ; Malik, Latesh

  • Author_Institution
    Dept. of CSE, GHRCE, Nagpur, India
  • fYear
    2014
  • fDate
    15-17 Dec. 2014
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    The world is completely working on digital data. The largest and prime or main collection of this digital data is web. The size of this web is increasing round-the-clock. The principal problem is to search this huge database for specific information. To state whether a web page is relevant to a search topic is a dilemma[l]. There are many techniques to state the relevancy but if focus on the users´ perspective as key issue to guide search then semantic based web crawler are unsurpassed. Semantic based web crawlers maps relevancy with the help of lexical database. The crawler uses the senses provided by lexical database to discover relatedness among the search query and the web page being searched. Focused web crawler helps to find the similarity of web page to the search query without downloading that page. Thus focused web crawler is saving the bandwidth required to download a web page. This paper proposed and discuss one such approach to implement semantic based focused web crawler.
  • Keywords
    Web sites; database management systems; query processing; text analysis; URL; Web page; anchor text; database searching; focused Web crawler design; lexical database; search query; semantic based Web crawler; semantic nature analysis; Books; Crawlers; Databases; Engines; Semantics; Uniform resource locators; Web pages; Lexical Database; Metadata; Relevance; Searching; Semantic; Web Crawling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Industrial and Information Systems (ICIIS), 2014 9th International Conference on
  • Conference_Location
    Gwalior
  • Print_ISBN
    978-1-4799-6499-4
  • Type

    conf

  • DOI
    10.1109/ICIINFS.2014.7036556
  • Filename
    7036556