Title :
A Linear-Clustering algorithm for controlling quality of large scale water-level data in Thailand
Author :
Pattanavijit, Nuttapon ; Vateekul, Peerapon ; Sarinnapakorn, Kanoksri
Author_Institution :
Dept. of Comput. Eng., Chulalongkorn Univ., Bangkok, Thailand
Abstract :
Hydro and Agro Informatics Institute (HAII) has installed more than 800 telemetry stations across Thailand to collect water level data for operation tasks and researches, e.g., flooding prevention system. To have an accurate result, it is crucial to control the quality of data by detecting and filtering out anomalies. In our previous work, a data quality management system to capture various types of errors was proposed. However, the algorithms to detect outliers and missing patterns are based on DBSCAN, which requires complicated implementation and excessive computational cost. In this paper, we present a novel clustering algorithm specially designed for water-level data called “Linear Clustering. ” Compared to DBSCAN, it is not only much easier to develop, but it also requires less computational time without losing any detection accuracies. An analysis of the runtime showed that the proposed algorithm requires linear time. Experiments were conducted on large scale water-level data. For outlier detection, the new method took only 3 seconds on 30,000 records of data, while the previous work took 261 seconds. For missing pattern detection, although there is no difference in runtime, Linear Clustering´s code is uncomplicated, and therefore it requires less developing time.
Keywords :
data handling; database management systems; emergency management; floods; pattern clustering; quality control; telemetry; DBSCAN; Hydro and Agro Informatics Institute; Thailand; data quality management system; flooding prevention system; large scale water-level data; linear clustering code; linear-clustering algorithm; missing pattern detection; quality control; telemetry stations; Algorithm design and analysis; Clustering algorithms; Detection algorithms; Informatics; Noise; Runtime; Telemetry; clustering; data improvement; data quality control; missing pattern detection; outlier detection;
Conference_Titel :
Computer Science and Software Engineering (JCSSE), 2015 12th International Joint Conference on
Conference_Location :
Songkhla
DOI :
10.1109/JCSSE.2015.7219808