DocumentCode :
3532234
Title :
Improving Association Rules by Optimizing Discretization Based on a Hybrid GA: A Case Study of Data from Forest Ecology Stations in China
Author :
Jianxin Wang ; Fan Yang ; Xiaoli Dong ; Ben Xu ; Baojiang Cui
Author_Institution :
Sch. of Inf., Beijing Forestry Univ., Beijing, China
fYear :
2013
fDate :
9-11 Sept. 2013
Firstpage :
627
Lastpage :
632
Abstract :
Association rule is one of the key techniques for data mining and knowledge discovery in databases. Before mining association rules from numerical data, however, the variable domains are required to be partitioned into sections first (i.e. the data should be discretized), which will directly affect the quality of association rules to be generated. But it is infeasible to find the best combination of dividing points in polynomial time, since the problem is an NP-complete one. We search the optimal combination of dividing points from continuous intervals by employing genetic algorithms (GA), in which the properties of strong association rules correspondingly yielded are treated as fitness function to guide the algorithm iteration. Operations in GA, together with sampling technique and hill climbing algorithm, are discussed in detail. Experimental results show that association rules are generated with good properties in quantity, support, and confidence. The proposed approach is successfully applied to mine massive data accumulated in the forest ecological stations widely distributed in China. In addition, the methods and algorithms are general and are ready to be adjusted and applied to produce good-property association rules in other fields where the variable domains are yet to be partitioned precisely or completely.
Keywords :
computational complexity; data mining; ecology; genetic algorithms; iterative methods; China; GA; NP-complete problem; association rule quality; data mining; fitness function; forest ecology stations; genetic algorithms; hybrid GA-based discretization; knowledge discovery; massive data accumulation; numerical data; polynomial time; sampling technique; Algorithm design and analysis; Association rules; Equations; Genetic algorithms; Mathematical model; Optimization; association rule; genetic algorithm; hill climbing; variable domain partition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Emerging Intelligent Data and Web Technologies (EIDWT), 2013 Fourth International Conference on
Conference_Location :
Xi´an
Print_ISBN :
978-1-4799-2140-9
Type :
conf
DOI :
10.1109/EIDWT.2013.113
Filename :
6631691
Link To Document :
بازگشت