DocumentCode :
2079625
Title :
Automatic wrapper generation for semi-structures biological data based on table structure identification
Author :
Chen, Liangyou ; Jamil, Hasan M. ; Wang, Nan
Author_Institution :
Mississippi State Univ., USA
fYear :
2003
fDate :
1-5 Sept. 2003
Firstpage :
55
Lastpage :
59
Abstract :
Biological data analyses usually require complex manipulations involving tool applications, multiple Web site navigation, result selection and filtering, iteration over the Internet. Most biological data are generated from structured databases and by applications and presented to the users embedded within repeated structures, or tables, in HTML documents. In this paper we outline a novel technique for the identification of table structures in HTML documents. This identification technique is then used to automatically generate composite wrappers for applications requiring distributed resources. We demonstrate that our method is robust enough to discover standard as well as non-standard table structures in HTML documents. Thus, our technique outperforms contemporary techniques used in systems such as XWrap and AutoWrapper. We discuss our technique in the context of our PickUp system that exploits the theoretical developments presented in this paper and emerges as an elegant automatic wrapper generation system.
Keywords :
biology computing; data analysis; data structures; distributed processing; hypermedia markup languages; query processing; AutoWrapper; HTML documents; Internet; PickUp system; Web site navigation; XWrap; automatic wrapper generation; biological data based; composite wrappers; distributed resources; repeated structures; result selection; structured databases; table structure identification; tool applications; Application software; Automation; Bioinformatics; Cancer; Costs; Data analysis; Databases; Genomics; HTML; Induction generators;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database and Expert Systems Applications, 2003. Proceedings. 14th International Workshop on
ISSN :
1529-4188
Print_ISBN :
0-7695-1993-8
Type :
conf
DOI :
10.1109/DEXA.2003.1231998
Filename :
1231998
Link To Document :
بازگشت