Title :
Automatic wrapper generation for semi-structures biological data based on table structure identification
Author :
Chen, Liangyou ; Jamil, Hasan M. ; Wang, Nan
Author_Institution :
Mississippi State Univ., USA
Abstract :
Biological data analyses usually require complex manipulations involving tool applications, multiple Web site navigation, result selection and filtering, iteration over the Internet. Most biological data are generated from structured databases and by applications and presented to the users embedded within repeated structures, or tables, in HTML documents. In this paper we outline a novel technique for the identification of table structures in HTML documents. This identification technique is then used to automatically generate composite wrappers for applications requiring distributed resources. We demonstrate that our method is robust enough to discover standard as well as non-standard table structures in HTML documents. Thus, our technique outperforms contemporary techniques used in systems such as XWrap and AutoWrapper. We discuss our technique in the context of our PickUp system that exploits the theoretical developments presented in this paper and emerges as an elegant automatic wrapper generation system.
Keywords :
biology computing; data analysis; data structures; distributed processing; hypermedia markup languages; query processing; AutoWrapper; HTML documents; Internet; PickUp system; Web site navigation; XWrap; automatic wrapper generation; biological data based; composite wrappers; distributed resources; repeated structures; result selection; structured databases; table structure identification; tool applications; Application software; Automation; Bioinformatics; Cancer; Costs; Data analysis; Databases; Genomics; HTML; Induction generators;
Conference_Titel :
Database and Expert Systems Applications, 2003. Proceedings. 14th International Workshop on
Print_ISBN :
0-7695-1993-8
DOI :
10.1109/DEXA.2003.1231998