Title :
An online software for decision tree classification and visualization using c4.5 algorithm (ODTC)
Author :
Das, S. ; Dahiya, Susheela ; Bharadwaj, Anshu
Author_Institution :
I.A.R.I., New Delhi, India
Abstract :
Classification is an important and widely carried out task of data mining. It is a predictive modelling task which is defined as building a model for the target variable as a function of the explanatory variables. There are many well established techniques for classification, while decision tree is a very important and popular technique from the machine learning domain. Decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs and utility. C4.5 is a well known decision tree algorithm used for classifying datasets. The C4.5 algorithm is Quintan´s extension of his own ID3 algorithm for decision tree classification. It induces decision trees and generates rules from datasets, which could contain categorical and/or numerical attributes. The rules could be used to predict categorical values of attributes from new records. C4.5 performs well in classifying the dataset as well as in generating useful rules. In this paper, a web based software for rule generation and decision tree induction using C4.5 algorithm has been discussed. The visualization in the form of tree structure enhances the understanding of the generated rules. The software contains the feature to impute the missing values in data. The input data can both be categorical and numerical in nature. The software can import TXT, XLS and CSV data file formats. Enhanced waterfall model has been used for the software development process. This software will be useful for academicians, researchers and students working in the area of data mining, agriculture and other fields where huge amount of data is generated.
Keywords :
data mining; data visualisation; decision trees; learning (artificial intelligence); pattern classification; software engineering; C4.5 algorithm; CSV data file format; ID3 algorithm; TXT data file format; Web based software; XLS data file format; categorical attributes; chance event outcomes; data mining; data visualization; dataset classification; decision support tool; decision tree classification; decision tree induction; enhanced waterfall model; explanatory variables; machine learning domain; numerical attributes; online software; predictive modelling task; rule generation; software development process; tree-like graph; Decision support systems; Erbium; Handheld computers; C4.5 Algorithm; Classification; Data mining; Decision Tree; waterfall model;
Conference_Titel :
Computing for Sustainable Global Development (INDIACom), 2014 International Conference on
Conference_Location :
New Delhi
Print_ISBN :
978-93-80544-10-6
DOI :
10.1109/IndiaCom.2014.6828107