Title :
Preprocessing opportunities in optimal numerical range partitioning
Author :
Elomaa, Tapio ; Rousu, Juho
Author_Institution :
Dept. of Comput. Sci., Helsinki Univ., Finland
Abstract :
We show that only segment borders have to be taken into account as cut point candidates when searching for the optimal multisplit of a numerical value range with respect to convex attribute evaluation functions. Segment borders can be found efficiently in a linear-time preprocessing step. With training set error, which is not strictly convex, the data can be preprocessed into an even smaller number of cut point candidates, called alternations, when striving for the optimal partition. We show that no segment borders (resp. alternations) can be overlooked with strictly convex functions (resp. training set error) without risking the loss of optimality. Our experiments show that while in real-world domains a significant reduction in the number of cut point candidates can be obtained for training set error, the number of segment borders is usually not much lower than that of boundary points
Keywords :
data mining; learning (artificial intelligence); alternations; convex attribute evaluation functions; cutpoint candidates; linear-time preprocessing step; optimal multisplit; optimal numerical range partitioning; segment borders; training set error; Computer errors; Computer science; Dynamic programming; Heuristic algorithms; Partitioning algorithms; Upper bound;
Conference_Titel :
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
0-7695-1119-8
DOI :
10.1109/ICDM.2001.989508