Title :
A Study of Data Reduction Using Multiset Decision Tables
Author :
Seelam, Uday ; Chan, Chien-Chung
Author_Institution :
Univ. of Akron, Akron
Abstract :
In rough set theory, observations of objects in a domain of interest are stored in a decision table where each row denoting one object. Objects with same description are duplicated. Duplications may be reduced by using information multisystems, which can be further transformed into multiset decision tables (MDT). In this paper, we have demonstrated the efficacy of MDT when dealing with very large data sets. Experimental results based on the well-known intrusion detection system (IDS) data set show that the size of MDT is only 1/3 of the original decision table when all features are used. It could be further reduced to 1/7 when a set of 7 features is used. We also showed that the running time of generating an MDT is faster than generating a C4.5-like decision tree based on the MS SQL server 2000.
Keywords :
data reduction; decision tables; rough set theory; security of data; C4.5-like decision tree; MS SQL server 2000; data reduction; information multisystems; intrusion detection system; multiset decision tables; rough set theory; very large data sets; Classification tree analysis; Computer science; Data analysis; Data mining; Decision trees; File servers; Information systems; Intrusion detection; Set theory; Time measurement;
Conference_Titel :
Granular Computing, 2007. GRC 2007. IEEE International Conference on
Conference_Location :
Fremont, CA
Print_ISBN :
978-0-7695-3032-1
DOI :
10.1109/GrC.2007.90