Title :
A Web data extraction description language and its implementation
Author :
Wu, I-Chen ; Su, Jui-Yuan ; Chen, Loon-Been
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., National Chiao Tung Univ., Hsinchu, Taiwan
Abstract :
A data extraction model, named the browser-oriented data extraction (BODE) model, was proposed by I-Chen Wu et al. (2005) to extract Web contents with script functions. In this model, the system built on top of browsers accesses pages by simulating users´ operations on browsers. Based on this model, this paper defines a scripting language, named the BODED (browser-oriented data extraction description) language, which instructs the system how to do data extraction. This paper proposes a technique, called indirect browser replication to implement a BODE system, and also optimize the performance of this technique.
Keywords :
Internet; formal specification; knowledge acquisition; online front-ends; specification languages; user interfaces; BODE model; BODE system; BODED; Web content extraction; Web data extraction description language; browser-oriented data extraction description language; indirect browser replication; script functions; scripting language; Bibliographies; Computer science; Data engineering; Data mining; Databases; HTML; Java; Uniform resource locators; Web pages; XML;
Conference_Titel :
Computer Software and Applications Conference, 2005. COMPSAC 2005. 29th Annual International
Print_ISBN :
0-7695-2413-3
DOI :
10.1109/COMPSAC.2005.38