• DocumentCode
    124142
  • Title

    Multi-feature and DAG-Based Multi-tree Matching Algorithm for Automatic Web Data Mining

  • Author

    Shengsheng Shi ; Chengfei Liu ; Chunfeng Yuan ; Yihua Huang

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Nanjing Univ., Nanjing, China
  • Volume
    1
  • fYear
    2014
  • fDate
    11-14 Aug. 2014
  • Firstpage
    118
  • Lastpage
    125
  • Abstract
    Web data extraction has received considerable attention and study in recent decades. To improve efficiency, many automatic Web data record mining approaches have been proposed. Among these approaches, each complete approach involves data record identification as well as data item alignment. In this paper, we propose a new multi-feature and DAG (Directed Acyclic Graph) based multi-tree matching algorithm for automatic data item alignment. Our algorithm improves alignment accuracy in two aspects. First, it combines multiple features to cope with the limitations of existing algorithms, second, it employs a DAG-based method to deduce the global alignment of data items with high accuracy. Experimental results show that our algorithm outperforms state-of-the-art data item alignment algorithms.
  • Keywords
    Internet; data mining; directed graphs; pattern matching; trees (mathematics); DAG-based multitree matching algorithm; Web data extraction; automatic Web data record mining approaches; automatic data item alignment; data record identification; directed acyclic graph; Accuracy; Data mining; Educational institutions; HTML; Manuals; Visualization; Web pages; Web data mining; data item alignment; directed acyclic graph; multi-tree matching; multiple features;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
  • Conference_Location
    Warsaw
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2014.24
  • Filename
    6927533