Applications of Rough Sets in the Field of Data Mining

Author

Butalia, Ayesha ; Dhore, M.L. ; Tewani, Geetika

Author_Institution

Coll. of Eng., Maharashtra Inst. of Technol., Pune

fYear

2008

fDate

16-18 July 2008

Firstpage

498

Lastpage

503

Abstract

The issues of Real World are Very large data sets, Mixed types of data (continuous valued, symbolic data), Uncertainty (noisy data), Incompleteness (missing, incomplete data), Data change, Use of background knowledge etc. The main goal of the rough set analysis is induction of approximations of concepts [4]. Rough sets constitute a sound basis for KDD. It offers mathematical tools to discover patterns hidden in data [4] and hence used in the field of data mining. Rough Sets does not require any preliminary information as Fuzzy sets require membership values or probability is required in statistics. Hence this is its specialty. Two novel algorithms to find optimal reducts of condition attributes based on the relative attribute dependency are implemented using Java 1.5, out of which the first algorithms gives simple reduct whereas the second one gives the reduct with minimum attributes, The presented implementation serves as a prototype system for extracting decision rules, which is the first module. Second module gives positive regions for dependencies. Third module is reducts for calculating the minimum attributes to decide decision, with two techniques, first with brute force backward elimination which simply selects the attributes in the given order to check if they should be eliminated, and the second technique is the information entropy-based algorithm which calculates the information entropy conveyed in each attribute and selects the one with the maximum information gain for elimination. Fourth modules describes the Equivalence classes for Classification including lower and upper approximation for implementing hard computing and soft computing respectively and last module is the discernibility matrix and functions which is used that stores the differences between attribute values for each pair of data tuples. Rather than searching on the entire training set, the matrix is instead searched to detect redundant attributes. All these ultimately constitute the modul- - es of the system. The implemented system is tested on a small sized application first to verity the mathematical calculations involved which is not practically feasible with large database. It is also tested on a medium sized application example to illustrate the usefulness of the system and the incorporated language.

Keywords

data mining; equivalence classes; matrix algebra; pattern classification; rough set theory; very large databases; brute force backward elimination; classification equivalence classes; data mining; decision rule extraction; discernibility matrix; information entropy-based algorithm; pattern discovery; rough set analysis; very large data sets; Acoustic noise; Background noise; Data mining; Fuzzy sets; Java; Probability; Rough sets; Statistics; System testing; Uncertainty; Rough sets; data mining; positive region; reducts;

fLanguage

English

Publisher

ieee

Conference_Titel

Emerging Trends in Engineering and Technology, 2008. ICETET '08. First International Conference on

Conference_Location

Nagpur, Maharashtra

Print_ISBN

978-0-7695-3267-7

Electronic_ISBN

978-0-7695-3267-7

Type

conf

DOI

10.1109/ICETET.2008.143

Filename

4579951

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2313550