• DocumentCode
    3305968
  • Title

    Domain-Specific Deep Web Sources Discovery

  • Author

    Wang, Ying ; Zuo, Wanli ; Peng, Tao ; He, Fengling

  • Author_Institution
    Coll. of Comput. Sci. & Technol., Jilin Univ., Changchun
  • Volume
    5
  • fYear
    2008
  • fDate
    18-20 Oct. 2008
  • Firstpage
    202
  • Lastpage
    206
  • Abstract
    The Web has been rapidly deepened with myriad searchable databases online, where data are hidden behind query interfaces. However, users often have difficulties in finding the right sources and then querying over them in myriad useful databases online. For solving this problem, this paper presents a new method by importing focused crawling technology to automatically accomplish deep web sources discovery. Firstly, locate Web sites for domain-specific data sources based on focused crawling. Secondly, judge whether the web site exists deep web query interface in the former three depths. Lastly, judge whether the deep Web query interface is relevant to a given topic. Importing focused crawling technology makes the identification of deep web query interface locate in a specific domain and capture relative pages to a given topic instead of pursuing high overlay ratios. This method has dramatically reduced the quantity of pages for the crawler to identify deep Web query interfaces.
  • Keywords
    Internet; data mining; query processing; Web sites; deep Web query interface; deep Web sources discovery; domain-specific data sources; focused crawling technology; Automatic control; Computer interfaces; Computer science; Crawlers; Databases; Educational institutions; Helium; Search engines; Uniform resource locators; Web pages; Classification; deep webl; focused crawling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Computation, 2008. ICNC '08. Fourth International Conference on
  • Conference_Location
    Jinan
  • Print_ISBN
    978-0-7695-3304-9
  • Type

    conf

  • DOI
    10.1109/ICNC.2008.350
  • Filename
    4667426