Title :
An EffectiveMulti-Layer Model for Controlling the Quality of Data
Author :
Leung, Carson Kai-Sang ; Mateo, Mark Anthony F ; Nadler, Andrew J.
Author_Institution :
Univ. of Manitoba, Winnipeg
Abstract :
Data mining aims to search for implicit, previously unknown, and potentially useful information that might be embedded in the data. It is well known that "garbage in, garbage out". Hence, to get meaningful mining results, a clean set of data is essential. In this paper, we propose an effective model for controlling the quality of data. Specifically, this three-layer model focuses on data validity and data consistency. To elaborate, the internal layer ensures that the observed data are valid and their values fall within reasonable ranges. The temporal layer ensures that data are consistent with their temporal behaviour. The spatial layer ensures that data are consistent with their spatial neighbours. A case study on applying our proposed model to real-life weather data for an agricultural application shows that our model is effective in controlling and improving data quality, and thus leading to better mining results. It is important to note the application of our proposed model is not confined to the weather data for agricultural applications. We also discuss, in this paper, how the proposed three-layer model can be effectively applicable to control the quality of data in some other real-life situations.
Keywords :
data integrity; data mining; data consistency; data mining; data quality; data validity; multilayer model; Agriculture; Computer science; Data analysis; Data mining; Databases; Engines; Information analysis; Risk analysis; Wireless networks; Wireless sensor networks;
Conference_Titel :
Database Engineering and Applications Symposium, 2007. IDEAS 2007. 11th International
Conference_Location :
Banff, Alta.
Print_ISBN :
978-0-7695-2947-9
DOI :
10.1109/IDEAS.2007.4318086