• DocumentCode
    2766022
  • Title

    VEDD- a visual wrapper for extraction of data using DOM tree

  • Author

    Tripathy, A.K. ; Joshi, Nilakshi ; Thomas, Steffy ; Shetty, Shweta ; Thomas, Namitha

  • Author_Institution
    Dept. of Comput. Eng., Don Bosco Inst. of Technol., Mumbai, India
  • fYear
    2012
  • fDate
    19-20 Oct. 2012
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    The World Wide Web plays an important role while searching for information in the data network. Users are constantly exposed to an ever-growing flood of information. A wrapper is an application which helps in searching for Search Results Records (SSR) from multiple search engines. This helps in making the search more efficient and reliable. VEDD wrapper extracts the relevant SRRs from three search engines by filtering out the noisy and redundant records. Finally the unique set of records is displayed in a common VEDD search result page. The extraction is performed using the concepts of Document Object Model (DOM) tree. The paper presents a series of data filters to detect and remove irrelevant data from the web page. The data filters will also be used to further improve the similarity check of data records. Also, visual cues from the underlying browser rendering engine is made use to locate and extract the relevant data region from the deep web by the keyword matching technique.
  • Keywords
    Internet; Web sites; information filtering; information filters; search engines; string matching; DOM tree; VEDD wrapper; Web page; World Wide Web; browser rendering engine; data extraction; data filters; data network; data record similarity check; deep Web; document object model; irrelevant data detection; irrelevant data removal; keyword matching technique; search engines; search results records; visual wrapper; Data mining; Filtering; Flowcharts; HTML; Search engines; Visualization; Web pages; Content Keyword; DOM tree; Information extraction; Search engine results page;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Communication, Information & Computing Technology (ICCICT), 2012 International Conference on
  • Conference_Location
    Mumbai
  • Print_ISBN
    978-1-4577-2077-2
  • Type

    conf

  • DOI
    10.1109/ICCICT.2012.6398114
  • Filename
    6398114