• DocumentCode
    1879015
  • Title

    Web content extraction based on subject detection and node density

  • Author

    Petprasit, Warid ; Jaiyen, Saichon

  • Author_Institution
    Dept. of Comput. Sci., King Mongkut´s Inst. of Technol. Ladkrabang, Bangkok, Thailand
  • fYear
    2015
  • fDate
    28-31 Jan. 2015
  • Firstpage
    121
  • Lastpage
    125
  • Abstract
    Currently, very large data have been transferred from everywhere through World Wide Web. Consequently, the information extraction systems have been arising and many researches have been focusing on those data for utilizing them. These systems are very useful for data pre-processing and cleaning for real-time applications. Moreover, these systems can make other analyzing systems to analyze the data in real time such as social network mining, web mining, data mining, or even special tasks such as false advertisement detection, demand forecasting, and comment extraction on product and service reviews. In this paper, we focus on extracting the content data of web pages in e-commerce web sites based on subject detection and node density. In the experimental results, it can signify that our proposed method is appropriated to extract the data rich region in data-intensive pages in an automatic fashion.
  • Keywords
    Big Data; Internet; Web sites; electronic commerce; information retrieval; Web content extraction; Web pages; World Wide Web; content data extraction; data cleaning; data pre-processing; data rich region; data-intensive pages; e-commerce Web sites; information extraction systems; node density; real-time applications; subject detection; very large data; Cascading style sheets; Data mining; Uniform resource locators; Web pages; XML; data intensive; e-commerce; node density (SDND); subject detection; web content extraction; web information extraction; web mining; wrapper induction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Knowledge and Smart Technology (KST), 2015 7th International Conference on
  • Conference_Location
    Chonburi
  • Print_ISBN
    978-1-4799-6048-4
  • Type

    conf

  • DOI
    10.1109/KST.2015.7051455
  • Filename
    7051455