DocumentCode :
1859950
Title :
Assessment of the Trade-off Curve Accuracy in the Bump Hunting Using the Tree-GA
Author :
Hirose, Hideo
Author_Institution :
Dept. of Syst. Design & Inf., Kyushu Inst. of Technol., Fukuoka, Japan
fYear :
2010
fDate :
9-10 Jan. 2010
Firstpage :
597
Lastpage :
600
Abstract :
Suppose that we are interested in classifying n points in a z-dimensional space into two groups having response 1 and response 0 as the target variable. In some real data cases in customer classification, it is difficult to discriminate the favorable customers showing response 1 from others because many response 1 points and 0 points are closely located. In such a case, to find the denser regions to the favorable customers is considered to be an alternative. Such regions are called the bumps, and finding them is called the bump hunting. By pre-specifying a pureness rate p in advance a maximum capture rate c could be obtained. Then a trade-off curve between p and c can be constructed. Thus, to find the bump regions is equivalent to construct the trade-off curve. When we adopt simpler boundary shapes for the bumps such as the union of z-dimensional boxes located parallel to some explanation variable axes, it would be convenient to adopt the binary decision tree. Since the conventional binary decision tree will not provide the maximum capture rates, we use the genetic algorithm (GA), specified to the tree structure, the tree-GA. Using the property that the tree-GA has a tendency to provide many local maxima of the capture rates, we can estimate the upper bound curve for the trade-off curve by using the extreme-value statistics. Since the bump regions obtained by using the tree-GA are conservative comparing to the optimal regions, we should investigate how accurate the trade-off curve is using the tree-GA. We have assessed the accuracy for the trade-off curve in typical fundamental cases that may be observed in real customer data cases. We have found that the proposed tree-GA can construct the effective trade-off curve which is close to the optimal one.
Keywords :
data mining; genetic algorithms; pattern classification; statistics; trees (mathematics); bump hunting; customer classification; extreme-value statistics; genetic algorithm; trade-off curve accuracy; tree-GA structure; Classification tree analysis; Data mining; Decision trees; Genetic algorithms; Informatics; Shape; Space technology; Statistics; Tree data structures; Upper bound; bump hunting; data mining; evaluation; extreme-value statistics; genetic algorithm; trade-off curve;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Knowledge Discovery and Data Mining, 2010. WKDD '10. Third International Conference on
Conference_Location :
Phuket
Print_ISBN :
978-1-4244-5397-9
Electronic_ISBN :
978-1-4244-5398-6
Type :
conf
DOI :
10.1109/WKDD.2010.154
Filename :
5432473
Link To Document :
بازگشت