DocumentCode :
2130796
Title :
Efficient Distance Computation Using SQL Queries and UDFs
Author :
Pitchaimalai, Sasi K. ; Ordonez, Carlos ; Garcia-Alvarado, Carlos
Author_Institution :
Dept. of Comput. Sci., Univ. of Houston, Houston, TX
fYear :
2008
fDate :
15-19 Dec. 2008
Firstpage :
533
Lastpage :
542
Abstract :
Distance computation is one of the most computationally intensive operations employed by many data mining algorithms. Performing such matrix computations within a DBMS creates many optimization challenges. We propose techniques to efficiently compute Euclidean distance using SQL queries and user-defined functions (UDFs). We concentrate on efficient Euclidean distance computation for the well-known K-means clustering algorithm. We present SQL query optimizations and a scalar UDF to compute Euclidean distance. We experimentally evaluate performance and scalability of our proposed SQL queries and UDF with large data sets on a modern DBMS. We benchmark distance computation on two important data mining techniques: clustering and classification. In general, UDFs are faster than SQL queries because they are executed in main memory. Data set size is the main factor impacting performance, followed by data set dimensionality.
Keywords :
SQL; data mining; matrix algebra; pattern classification; pattern clustering; query processing; DBMS; Euclidean distance computation; K-means clustering algorithm; SQL query optimizations; classification technique; clustering technique; data mining algorithms; matrix computations; performance evaluation; user-defined functions; Clustering algorithms; Computer science; Conferences; Data mining; Euclidean distance; High level languages; Machine learning algorithms; Query processing; Scalability; USA Councils; SQL; UDF; distance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3503-6
Electronic_ISBN :
978-0-7695-3503-6
Type :
conf
DOI :
10.1109/ICDMW.2008.135
Filename :
4733977
Link To Document :
بازگشت