Title :
An efficient algorithm for induction with random sampling
Author :
Mahmood, Ali Mirza ; Kuppa, Mrithyumjaya Rao ; Chandu, V. Sai Phani
Author_Institution :
Acharya Nagarjuna Univ., Guntur, India
Abstract :
Decision trees induction is among powerful and commonly encountered architecture for extracting of classification knowledge from datasets of labeled instances. However, learning decision trees from large irrelevant datasets is quite different from learning small and moderate sized datasets. In this paper, we propose a simple yet effective composite splitting criterion equal to a random sampling approach and gain ratio. Our random sampling method depends on small random subset of attributes and it is computationally cheap to act on such a set in a reasonable time. The superiority of the composite splitting criterion can persist when used for high dimensional datasets with irrelevant attributes. The empirical and theoretical prospective are validated by using 40 UCI datasets. The experimental results indicate that the proposed new heuristic function can result in much more simpler trees with almost unaffected or improved accuracy.
Keywords :
decision trees; learning (artificial intelligence); pattern classification; sampling methods; composite splitting criterion; decision tree induction; decision tree learning; gain ratio; knowledge classification; random sampling method; Accuracy; Classification algorithms; Decision trees; Entropy; Impurities; Indexes; Machine learning algorithms; Composite Splitting Criterion; Consistence subset evaluation; Decision trees; Feature Subset Evaluation; Random Sampling; Splitting criteria;
Conference_Titel :
Emerging Trends in Electrical and Computer Technology (ICETECT), 2011 International Conference on
Conference_Location :
Tamil Nadu
Print_ISBN :
978-1-4244-7923-8
DOI :
10.1109/ICETECT.2011.5760206