DocumentCode
2313550
Title
Applications of Rough Sets in the Field of Data Mining
Author
Butalia, Ayesha ; Dhore, M.L. ; Tewani, Geetika
Author_Institution
Coll. of Eng., Maharashtra Inst. of Technol., Pune
fYear
2008
fDate
16-18 July 2008
Firstpage
498
Lastpage
503
Abstract
The issues of Real World are Very large data sets, Mixed types of data (continuous valued, symbolic data), Uncertainty (noisy data), Incompleteness (missing, incomplete data), Data change, Use of background knowledge etc. The main goal of the rough set analysis is induction of approximations of concepts [4]. Rough sets constitute a sound basis for KDD. It offers mathematical tools to discover patterns hidden in data [4] and hence used in the field of data mining. Rough Sets does not require any preliminary information as Fuzzy sets require membership values or probability is required in statistics. Hence this is its specialty. Two novel algorithms to find optimal reducts of condition attributes based on the relative attribute dependency are implemented using Java 1.5, out of which the first algorithms gives simple reduct whereas the second one gives the reduct with minimum attributes, The presented implementation serves as a prototype system for extracting decision rules, which is the first module. Second module gives positive regions for dependencies. Third module is reducts for calculating the minimum attributes to decide decision, with two techniques, first with brute force backward elimination which simply selects the attributes in the given order to check if they should be eliminated, and the second technique is the information entropy-based algorithm which calculates the information entropy conveyed in each attribute and selects the one with the maximum information gain for elimination. Fourth modules describes the Equivalence classes for Classification including lower and upper approximation for implementing hard computing and soft computing respectively and last module is the discernibility matrix and functions which is used that stores the differences between attribute values for each pair of data tuples. Rather than searching on the entire training set, the matrix is instead searched to detect redundant attributes. All these ultimately constitute the modul- - es of the system. The implemented system is tested on a small sized application first to verity the mathematical calculations involved which is not practically feasible with large database. It is also tested on a medium sized application example to illustrate the usefulness of the system and the incorporated language.
Keywords
data mining; equivalence classes; matrix algebra; pattern classification; rough set theory; very large databases; brute force backward elimination; classification equivalence classes; data mining; decision rule extraction; discernibility matrix; information entropy-based algorithm; pattern discovery; rough set analysis; very large data sets; Acoustic noise; Background noise; Data mining; Fuzzy sets; Java; Probability; Rough sets; Statistics; System testing; Uncertainty; Rough sets; data mining; positive region; reducts;
fLanguage
English
Publisher
ieee
Conference_Titel
Emerging Trends in Engineering and Technology, 2008. ICETET '08. First International Conference on
Conference_Location
Nagpur, Maharashtra
Print_ISBN
978-0-7695-3267-7
Electronic_ISBN
978-0-7695-3267-7
Type
conf
DOI
10.1109/ICETET.2008.143
Filename
4579951
Link To Document