DocumentCode :
2334310
Title :
Preprocessing opportunities in optimal numerical range partitioning
Author :
Elomaa, Tapio ; Rousu, Juho
Author_Institution :
Dept. of Comput. Sci., Helsinki Univ., Finland
fYear :
2001
fDate :
2001
Firstpage :
115
Lastpage :
122
Abstract :
We show that only segment borders have to be taken into account as cut point candidates when searching for the optimal multisplit of a numerical value range with respect to convex attribute evaluation functions. Segment borders can be found efficiently in a linear-time preprocessing step. With training set error, which is not strictly convex, the data can be preprocessed into an even smaller number of cut point candidates, called alternations, when striving for the optimal partition. We show that no segment borders (resp. alternations) can be overlooked with strictly convex functions (resp. training set error) without risking the loss of optimality. Our experiments show that while in real-world domains a significant reduction in the number of cut point candidates can be obtained for training set error, the number of segment borders is usually not much lower than that of boundary points
Keywords :
data mining; learning (artificial intelligence); alternations; convex attribute evaluation functions; cutpoint candidates; linear-time preprocessing step; optimal multisplit; optimal numerical range partitioning; segment borders; training set error; Computer errors; Computer science; Dynamic programming; Heuristic algorithms; Partitioning algorithms; Upper bound;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
0-7695-1119-8
Type :
conf
DOI :
10.1109/ICDM.2001.989508
Filename :
989508
Link To Document :
بازگشت