DocumentCode :
2758138
Title :
Wraplet: Wrapping Your Web Contents with a Lightweight Language
Author :
Sawa, Natsumi ; Morishima, Atsuyuki ; Sugimoto, Shigeo ; Kitagawa, Hiroyuki
Author_Institution :
Univ. of Tsukuba, Tsukuba
fYear :
2007
fDate :
16-18 Dec. 2007
Firstpage :
387
Lastpage :
394
Abstract :
Wrapping of Web sources is known to be one of the key tasks in information integration problems. This paper proposes Wraplet, a wrapping language for extracting structured data from Web contents written in HTML. Unlike existing solutions, Wraplet is designed as a lightweight language in which users can write scripts for wrapping easily with text editors. Its simple syntax and the library of useful patterns help the user write wrapping descriptions by hand. We explain the motivation of its development and the language design and then shows the result of a preliminary experiment about applicability of the language to real Web sources. We conducted a statistical analysis and obtained the result that the applicability of Wraplet is more than 90% at the 95% confidence level in the experimental setting.
Keywords :
Web sites; content management; hypermedia markup languages; HTML; HhyperText Markup Language; Web contents wrapping; Web sources wrapping; Wraplet; information integration; language design; statistical analysis; structured data extraction; text editors; user write wrapping descriptions; wrapping language; Content management; Data mining; HTML; Internet; Libraries; Monitoring; Statistical analysis; Temperature; Web pages; Wrapping; Information integration; Languages; Wrappers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal-Image Technologies and Internet-Based System, 2007. SITIS '07. Third International IEEE Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3122-9
Type :
conf
DOI :
10.1109/SITIS.2007.135
Filename :
4618800
Link To Document :
بازگشت