Title :
Analysis and Improvement of Data Extraction Technology on the Web
Author_Institution :
Ningxia Univ., Yinchuan, China
Abstract :
The paper introduces an improved technology and infrastructure to support the effective flow of information among the sources and services on the Web and their interconnection with legacy systems that were designed to operate with traditional relational databases. This technology is designed to work as a relational front-end to semi-structured data sources. It extracts data from web pages using declarative specification files that define extraction rules expressed in regular expressions.
Keywords :
Web services; knowledge acquisition; online front-ends; relational databases; software maintenance; Web services; data extraction technology; declarative specification files; legacy systems; relational databases; relational front end; semistructured data sources; Bismuth; Data mining; HTML; Humans; Information analysis; Information retrieval; Machine learning algorithms; Paper technology; Search engines; Web pages;
Conference_Titel :
e-Business and Information System Security (EBISS), 2010 2nd International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-5893-6
Electronic_ISBN :
978-1-4244-5895-0
DOI :
10.1109/EBISS.2010.5473712