Title :
Study of localized data cleansing process for ETL performance improvement in independent datamart
Author :
Savitri, F.N. ; Laksmiwati, Hira
Author_Institution :
Data & Software Eng. Res. Group, Inst. Teknol. Bandung (ITB), Bandung, Indonesia
Abstract :
Datawarehouse practitioners are thinking that the biggest efforts to build datawarehouse nowadays lies on ETL processing. The complexity of workload ETL processing depend on various and hetegeroneous data source profile that will be collected. A study to decrease the workload of ETL processing in datawarehouse development stages has been developed. A new concept of localized data source cleansing has been proposed. The consideration of inconsistent, non formal, expected existing and duplicated data source in localized data source´s profiles should be locally identified. It is expected that this consideration will lighten and shorten the ETL processing so the workload performance of ETL processing will be better.An investigation to the impact of localized and non localized heterogenious data cleansing has been done. Based on this investigation an automatic localised data cleansing and integration system has been defined. It is a cleansing processing for each data source profile which will be executed in the transactional data source site. It means this process will be done before the datawarehouse development stages. It is found that if the Automatic Data Cleansing process and Data Integrator process could be carry on sequentially then the ETL processing workload in datawarehouse development stages will decrease. It is proven that decreasing number of raw data through locally cleansing process became significant for data with lack of integrity constraint and lack of format data checking procedures.
Keywords :
data warehouses; ETL performance improvement; ETL processing; automatic localised data cleansing; data integrator process; data source profile; datawarehouse practitioners; extract-transform-load; independent datamart; integration system; localized data cleansing process study; nonlocalized heterogeneous data cleansing; transactional data source site; Data mining; Data warehouses; Databases; Position measurement; Redundancy; Testing; Time measurement; Automatic Data Cleansing; Automatic Data Integrator; Datawarehouse; Extract-transform-load (ETL);
Conference_Titel :
Electrical Engineering and Informatics (ICEEI), 2011 International Conference on
Conference_Location :
Bandung
Print_ISBN :
978-1-4577-0753-7
DOI :
10.1109/ICEEI.2011.6021806