Title :
Biological data integration: wrapping data and tools
Author_Institution :
Arizona State Univ., Tempe, AZ, USA
fDate :
6/1/2002 12:00:00 AM
Abstract :
Scientific data is inevitably digital and stored in a wide variety of formats in heterogeneous systems. Scientists need to access an integrated view of remote or local heterogeneous data sources with advanced data access, analysis, and visualization tools. Building a digital library for scientific data requires accessing and manipulating data extracted from flat files or databases, documents retrieved from the Web as well as data generated by software. We present an approach to wrapping web data sources, databases, flat files, or data generated by tools through a database view mechanism. Generally, a wrapper has two tasks: it first sends a query to the source to retrieve data and, second builds the expected output with respect to the virtual structure. Our wrappers are composed of a retrieval component based on an intermediate object view mechanism called search views mapping the source capabilities to attributes, and an Extensible Markup Language (XML) engine, respectively, to perform these two tasks. The originality of the approach consists of: 1) a generic view mechanism to access seamlessly data sources with limited capabilities and 2) the ability to wrap data sources as well as the useful specific tools they may provide. Our approach has been developed and demonstrated as part of the multidatabase system supporting queries via uniform object protocol model (OPM) interfaces.
Keywords :
biology computing; data analysis; digital libraries; distributed databases; hypermedia markup languages; information resources; scientific information systems; Web documents; XML engine; biological data integration; data access; data analysis; data retrieval; data visualization; data wrapping; databases; digital library; flat files; generic view mechanism; heterogeneous data sources; multidatabase system; query; search views; uniform object protocol model interfaces; virtual structure; Access protocols; Data analysis; Data mining; Data visualization; Engines; Information retrieval; Software libraries; Visual databases; Wrapping; XML; Algorithms; Artificial Intelligence; Computational Biology; Computer Communication Networks; Database Management Systems; Databases, Bibliographic; Databases, Factual; Databases, Nucleic Acid; Decision Support Techniques; Feasibility Studies; Information Storage and Retrieval; Internet; MEDLINE; Programming Languages;
Journal_Title :
Information Technology in Biomedicine, IEEE Transactions on
DOI :
10.1109/TITB.2002.1006299