DocumentCode :
2576021
Title :
A high speed decision tree classifier algorithm for huge dataset
Author :
Thangaparvathi, B. ; Anandhavalli, D. ; Mercy Shalinie, S.
Author_Institution :
Dept. of Comput. Sci., Thiagarajar Coll. of Eng.-Madurai, Madurai, India
fYear :
2011
fDate :
3-5 June 2011
Firstpage :
695
Lastpage :
700
Abstract :
Knowledge discovery is an important tool for the intelligent business to transform data into useful information that will increase the business revenue. Data mining techniques support automatic exploration of data, and attempts to classify the patterns and trends in data, and also infer decision rules from those patterns. Classification of dataset is an important function of mining which is a supervised machine learning procedure. Scalability and efficiency of the classifier algorithm becomes a major issue of concern when we use a large dataset and requires more number of dataset parsing. In this paper, we present a scalable decision tree algorithm for classifying large dataset with high processing speed, which requires only one scan over the dataset. It overcomes the drawback of RainForest algorithm which addresses the scalability issue and requires a pass over the dataset in each level of decision tree construction. The proposed algorithm significantly reduces the IO cost and also requires one time sorting for numerical attributes which leads to a better performance in time dimension. According to the experimental results, our algorithm acquires less execution time over the RainForest algorithm and also adoptable for any attribute selection method by which the accuracy of decision tree is improved.
Keywords :
data mining; decision trees; learning (artificial intelligence); pattern classification; IO cost reduction; RainForest algorithm; acyclic graph; attribute selection method; business revenue; data mining techniques; dataset classification; dataset parsing; decision tree classifier algorithm; decision tree construction; intelligent business; knowledge discovery; numerical attributes; supervised machine learning procedure; Automatic voltage control; Classification algorithms; Data structures; Databases; Decision trees; Partitioning algorithms; Prediction algorithms; Classification; Data mining; Decision tree; Performance; RainForest algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Recent Trends in Information Technology (ICRTIT), 2011 International Conference on
Conference_Location :
Chennai, Tamil Nadu
Print_ISBN :
978-1-4577-0588-5
Type :
conf
DOI :
10.1109/ICRTIT.2011.5972267
Filename :
5972267
Link To Document :
بازگشت