DocumentCode :
3136506
Title :
Data pre-processing support for data mining
Author :
MikSovský, Petr ; Matousek, K. ; Kouba, Zdenek
Author_Institution :
Fac. of Electr. Eng., Czech Tech. Univ., Prague, Czech Republic
Volume :
5
fYear :
2002
fDate :
6-9 Oct. 2002
Abstract :
It is well known that success of every data mining algorithm is strongly dependent on the quality of data processing. In this context it is natural that data pre-processing can be a very complicated task. Sometimes, data pre-processing takes more than half of the total time spent by solving the data mining problem. The paper describes a tool called SumatraTT, the goal of which is to make the process of data pre-processing easier and faster. Basically, SumatraTT (Transformation Tool) is a metadata-driven, platform independent, extensible, and universal data processing tool. These features have been achieved by building the tool as an interpreter of a transformation-oriented scripting language called SumatraScript. SumatraScript a is fully interpreted Java-like language combining together data access, metadata access, and common programming constructions. Furthermore, it supports RAD (Rapid Application Development) technology by providing the library of re-usable transformation templates. The second part of the paper contains a practical application of SumatraTT. It is a task aimed at prediction of water consumption in a regional distribution network.
Keywords :
authoring languages; data handling; data mining; meta data; software libraries; software reusability; very large databases; Java-like language; Rapid Application Development; SumatraScript; SumatraTT; Transformation Tool; data access; data mining; data pre-processing support; interpreter; metadata; regional distribution network; reusable transformation templates; transformation-oriented scripting language; universal data processing tool; water consumption prediction; Buildings; Data mining; Data preprocessing; Data processing; Databases; Decision making; Filtering; Java; Laboratories; Libraries;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man and Cybernetics, 2002 IEEE International Conference on
ISSN :
1062-922X
Print_ISBN :
0-7803-7437-1
Type :
conf
DOI :
10.1109/ICSMC.2002.1176327
Filename :
1176327
Link To Document :
بازگشت