DocumentCode
3226082
Title
Three level method using machine learning and rule based approach for extracting Web-table information
Author
Jung, Sung-Wong ; Lim, Sung-Shin ; Kwon, Hyuk-Chul
Author_Institution
Dept. of Comput. Sci. & Eng., Pusan Nat. Univ., South Korea
Volume
3
fYear
2004
fDate
2-6 Nov. 2004
Firstpage
3131
Abstract
Generally, Authors of HTML documents use various methods to clearly convey their intention. The table is the preeminent method among these, because the table contains meaningful data displayed in a structure with rows and columns. However, on the Internet, tables are used for the purpose of the knowledge structuring as well as design of documents. It is not easy task to distinguish those two tables because HTML does not separate presentation and structure. This makes information extracting from those tables more difficult. Therefore, in this paper, we are firstly interested in classifying tables into two types: meaningful tables and decorative tables. After that we extract information from meaningful tables.
Keywords
Internet; hypermedia markup languages; information retrieval; knowledge based systems; learning (artificial intelligence); HTML documents; Internet; Web-table information extracting; decorative tables; documents design; knowledge structuring; machine learning; meaningful tables; preeminent method; rule based approach; Animation; Computer science; Data mining; HTML; Internet; Machine learning; Pressing; Protocols; Shape; Stochastic processes;
fLanguage
English
Publisher
ieee
Conference_Titel
Industrial Electronics Society, 2004. IECON 2004. 30th Annual Conference of IEEE
Print_ISBN
0-7803-8730-9
Type
conf
DOI
10.1109/IECON.2004.1432313
Filename
1432313
Link To Document