Title :
Information Extraction Based on Table Area Locating for E-Commerce Websites
Author :
Ouyang, Liubo ; Dong, Rui ; Zou, Beiji
Author_Institution :
Software Sch., Hunan Univ., Changsha, China
Abstract :
Efficient extracting merchandise information is the key technology for e-commerce searching engine. By analyzing Web table characters of HTML pages of e-commerce Websites, this article proposes the notion of table area locating, and decomposes the merchandise information extraction into three key processes: searching preparative core areas (PCA), locating core area (CA) and extracting attribute values from core-area, and then design the algorithm of locating core area and the algorithm of extracting attributes names and values. We experimented with the new approach on some HTML pages from various e-commerce Websites. The results indicate that this approach can locate merchandise information area and extract attributes names and values efficiently, and have better performance of precise and recall.
Keywords :
Web sites; electronic commerce; hypermedia markup languages; information retrieval; search engines; CA; HTML page; PCA; e-commerce Website; locating core area; merchandise information extraction; preparative core area; search engine; table area locating; Algorithm design and analysis; Character recognition; Classification tree analysis; Data mining; Electronic commerce; HTML; Information analysis; Merchandise; Pattern recognition; Search engines; Area location; DOM tree; Information extraction; Web Tables;
Conference_Titel :
Intelligent Systems, 2009. GCIS '09. WRI Global Congress on
Conference_Location :
Xiamen
Print_ISBN :
978-0-7695-3571-5
DOI :
10.1109/GCIS.2009.310