Title :
An Enhanced Extract-Transform-Load System for Migrating Data in Telecom Billing
Author :
Agrawal, Himanshu ; Chafle, Girish ; Goyal, Sunil ; Mittal, Sumit ; Mukherjea, Sougata
Author_Institution :
Res. Lab., IBM India, New Delhi
Abstract :
Data migration has become a priority in many industries, spawned by a variety of business needs. Most of the existing tools for Extract, Transform and Load (ETL) process of data migration are piece-meal and do not present a complete solution. Moreover, while research has focused on the problem of Schema Mapping, a key step in the ETL process, most of the current algorithms do not perform well on real-world data. Researchers have suggested the use of Domain Knowledge to enhance schema mapping. In this paper, we use domain knowledge in an innovative manner to improve schema mapping in an ´actual´ industrial setting. Further, we take a comprehensive view of the data migration problem and present an end-to-end system for the ETL process, utilizing existing tools for each step and building connectors, wherever required. We focus on Data Migration for Telecom Billing and utilize domain knowledge captured in an ontology, a thesaurus and a set of rules to improve schema mapping. Experiments conducted on a real-life data demonstrate the effectiveness of our system and validate the utility of domain knowledge in data migration projects.
Keywords :
business data processing; data mining; ontologies (artificial intelligence); data migration; domain knowledge; extract-transform-load system; schema mapping; telecom billing; Communication industry; Computers; Connectors; Content management; Data mining; Laboratories; Ontologies; Real time systems; Telecommunication services; Thesauri;
Conference_Titel :
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
Conference_Location :
Cancun
Print_ISBN :
978-1-4244-1836-7
Electronic_ISBN :
978-1-4244-1837-4
DOI :
10.1109/ICDE.2008.4497537