DocumentCode :
3231640
Title :
Protection Techniques from Information Extraction
Author :
Greco, Gianluigi ; Ianni, Giovambattista ; Lio, Vincenzino ; Palopoli, Luigi
Author_Institution :
Calabria Univ.
fYear :
2006
fDate :
Dec. 2006
Firstpage :
1029
Lastpage :
1033
Abstract :
Information extraction technologies meet the market need for automatic tools for extracting semi-structured information from Web pages. However, pages may change over time due to different reasons, ranging from restyling pages to on-purpose modifications brought about into pages in order to puzzle Web wrappers. In this paper we deal with this latter scenario, by studying the issue of on-purpose wrapper spoiling and its relationship to wrapping. We present an architecture and a tool implementing a wrapper spoiling system, and discuss some practical spoiling techniques which are also experimentally tested
Keywords :
Internet; information retrieval; Web pages; on-purpose wrapper spoiling system; protection techniques; semistructured information extraction; Advertising; Application software; Data mining; Electronic mail; HTML; Humans; Protection; System testing; Web pages; Wrapping;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2747-7
Type :
conf
DOI :
10.1109/WI.2006.138
Filename :
4061515
Link To Document :
بازگشت