DocumentCode :
3024226
Title :
Clio: a schema mapping tool for information integration
Author :
Hernandez, Mauricio ; Ho, Howard ; Naumann, Felix ; Popa, Lucian
Author_Institution :
IBM Almaden Res. Center, San Jose, CA, USA
fYear :
2005
fDate :
7-9 Dec. 2005
Abstract :
The summary form only given. Information integration typically requires the construction of complex artifacts like federated databases, ETL scripts, data warehouses, applications for accessing multiple data sources, and applications that ingest or publish XML. For many companies, it is one of the most complicated IT tasks they face today. To reduce the overall cost, intelligent tools are needed to simplify this difficult task. Clio is a semi-automatic tool for schema mapping and data integration developed at IBM Almaden Research Center over the past few years. It takes source and target schemas as input, which may describe relational or XML data models. Via a graphical Schema Viewer, a user can then interactively specify attribute correspondences between the source and target schemas. An AttributeMatcher component helps suggest such correspondences, based on the similarity of both attribute names and attribute values. Once the user has specified correspondences, Clio generates SQL, SQL/XML, XQuery or XSLT on the fly to implement the specified transformation, which is guaranteed to produce output data that conforms to the target schema. In this paper, we first describe and demonstrate some basic features of Clio. In particular, we describe the abstracted problems and the algorithms behind the AttributeMatcher component. Then, we will describe additional research problems abstracted from the area of schema mapping and information integration, with an emphasis on graph algorithms and issues on scalability and parallelism.
Keywords :
SQL; XML; data models; graphical user interfaces; AttributeMatcher component; Clio; IBM Almaden Research Center; SQL; XML data model; XQuery; XSLT; abstracted problem; graph algorithm; graphical Schema Viewer; information integration; parallelism; relational data model; scalability; schema mapping tool; Costs; Data models; Data warehouses; Databases; Parallel architectures; Scalability; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Architectures,Algorithms and Networks, 2005. ISPAN 2005. Proceedings. 8th International Symposium on
ISSN :
1087-4089
Print_ISBN :
0-7695-2509-1
Type :
conf
DOI :
10.1109/ISPAN.2005.25
Filename :
1575798
Link To Document :
بازگشت