• DocumentCode
    3226082
  • Title

    Three level method using machine learning and rule based approach for extracting Web-table information

  • Author

    Jung, Sung-Wong ; Lim, Sung-Shin ; Kwon, Hyuk-Chul

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Pusan Nat. Univ., South Korea
  • Volume
    3
  • fYear
    2004
  • fDate
    2-6 Nov. 2004
  • Firstpage
    3131
  • Abstract
    Generally, Authors of HTML documents use various methods to clearly convey their intention. The table is the preeminent method among these, because the table contains meaningful data displayed in a structure with rows and columns. However, on the Internet, tables are used for the purpose of the knowledge structuring as well as design of documents. It is not easy task to distinguish those two tables because HTML does not separate presentation and structure. This makes information extracting from those tables more difficult. Therefore, in this paper, we are firstly interested in classifying tables into two types: meaningful tables and decorative tables. After that we extract information from meaningful tables.
  • Keywords
    Internet; hypermedia markup languages; information retrieval; knowledge based systems; learning (artificial intelligence); HTML documents; Internet; Web-table information extracting; decorative tables; documents design; knowledge structuring; machine learning; meaningful tables; preeminent method; rule based approach; Animation; Computer science; Data mining; HTML; Internet; Machine learning; Pressing; Protocols; Shape; Stochastic processes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Industrial Electronics Society, 2004. IECON 2004. 30th Annual Conference of IEEE
  • Print_ISBN
    0-7803-8730-9
  • Type

    conf

  • DOI
    10.1109/IECON.2004.1432313
  • Filename
    1432313