Title of article :
Reconfigurable Web Wrapper Agents for Biological
Information Integration
Author/Authors :
Chun-Nan Hsu، نويسنده , , Chia-Hui Chang، نويسنده , , Chang-Huain Hsieh، نويسنده , , Jiann-Jyh Lu، نويسنده , , Chien-Chi Chang، نويسنده ,
Issue Information :
ماهنامه با شماره پیاپی سال 2005
Abstract :
A variety of biological data is transferred and exchanged
in overwhelming volumes on the World Wide
Web. How to rapidly capture, utilize, and integrate the
information on the Internet to discover valuable biological
knowledge is one of the most critical issues in bioinformatics.
Many information integration systems have
been proposed for integrating biological data. These
systems usually rely on an intermediate software layer
called wrappers to access connected information
sources. Wrapper construction for Web data sources is
often specially hand coded to accommodate the differences
between each Web site. However, programming a
Web wrapper requires substantial programming skill,
and is time-consuming and hard to maintain. In this article
we provide a solution for rapidly building software
agents that can serve as Web wrappers for biological
information integration. We define an XML-based language
called Web Navigation Description Language
(WNDL), to model a Web-browsing session. A WNDL
script describes how to locate the data, extract the data,
and combine the data. By executing different WNDL
scripts, we can automate virtually all types of Webbrowsing
sessions. We also describe IEPAD (Information
Extraction Based on Pattern Discovery), a data extractor
based on pattern discovery techniques. IEPAD
allows our software agents to automatically discover
the extraction rules to extract the contents of a structurally
formatted Web page. With a programming-byexample
authoring tool, a user can generate a complete
Web wrapper agent by browsing the target Web sites.
We built a variety of biological applications to demonstrate
the feasibility of our approach.
Journal title :
Journal of the American Society for Information Science and Technology
Journal title :
Journal of the American Society for Information Science and Technology