DocumentCode :
3363858
Title :
Towards a benchmark for Web site extractors: a call for community participation
Author :
Kienle, Holger M. ; Sim, Susan Eliott
Author_Institution :
Victoria Univ., BC, Canada
fYear :
2003
fDate :
26-28 March 2003
Firstpage :
82
Lastpage :
87
Abstract :
The purpose of this paper is to propose a benchmark for comparing fact extractors for Web sites and to invite interested researchers and practitioners to participate in its development. Fact extraction is a fundamental and difficult problem in both traditional software reverse engineering and Web site reverse engineering. In both domains, there are often irregularities in the input that violate an extractor´s unstated assumptions. Consequently, it is difficult to predict how an extractor will perform in a given input. To remedy this problem, we created a benchmark for comparing fact extractors for the C++ programming language. We found that this benchmark improved our understanding of fact extraction, the tools produced, and the maturity of the community. The same approach, we believe, will be beneficial for Web site extractors and we propose WebETS (Web site Extractor Test Suite.) In this paper we give some starting points for the design of WebETS and ask others to join in the effort.
Keywords :
C++ language; Web sites; hypermedia markup languages; reverse engineering; software performance evaluation; C++ programming language; HTML; Web site Extractor Test Suite; Web site extractors; Web site reverse engineering; WebETS; benchmark; fact extractors; software reverse engineering; Acoustical engineering; Benchmark testing; Computer languages; Data mining; Maintenance engineering; Reverse engineering; Software systems; Web page design; Web pages; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Maintenance and Reengineering, 2003. Proceedings. Seventh European Conference on
ISSN :
1534-5351
Print_ISBN :
0-7695-1902-4
Type :
conf
DOI :
10.1109/CSMR.2003.1192414
Filename :
1192414
Link To Document :
بازگشت