• DocumentCode
    2047489
  • Title

    Outlier detection from ETL execution trace

  • Author

    Ghosh, Samiran ; Goswami, Saptarsi ; Chakrabarti, Amlan

  • Author_Institution
    A.K. Choudhury Sch. of Inf. Technol., Univ. of Calcutta, Kolkata, India
  • Volume
    6
  • fYear
    2011
  • fDate
    8-10 April 2011
  • Firstpage
    343
  • Lastpage
    347
  • Abstract
    Extract, Transform, Load (ETL) is an integral part of Data Warehousing (DW) implementation. The commercial tools that are used for this purpose captures lot of execution trace in form of various log files with plethora of information. However there has been hardly any initiative where any proactive analyses have been done on the ETL logs to improve their efficiency. In this paper we utilize outlier detection technique to find the processes varying most from the group in terms of execution trace. As our experiment was carried on actual production processes, any outlier we would consider as a signal rather than a noise. To identify the input parameters for the outlier detection algorithm we employ a survey among developer community with varied mix of experience & expertise. We use simple text parsing to extract these features from the logs, as shortlisted from the survey. Subsequently we applied outlier detection technique (Clustering based) on the logs. By this process we reduced our domain of detailed analysis from 500 logs to 44 logs (8 Percentage). Among the 5 outlier cluster, 2 of them are genuine concern, while the other 3 figure out because of the huge number of rows involved.
  • Keywords
    data warehouses; pattern clustering; text analysis; clustering based technique; data warehousing; extract, transform, load execution trace; outlier detection; text parsing; Algorithm design and analysis; Clustering algorithms; Data mining; Feature extraction; Measurement; Unified modeling language; Warehousing; Clustering; Data Warehousing; ETL; Log files; Outlier detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electronics Computer Technology (ICECT), 2011 3rd International Conference on
  • Conference_Location
    Kanyakumari
  • Print_ISBN
    978-1-4244-8678-6
  • Electronic_ISBN
    978-1-4244-8679-3
  • Type

    conf

  • DOI
    10.1109/ICECTECH.2011.5942112
  • Filename
    5942112