DocumentCode
3036374
Title
Knowledge discovery in databases: applications in the electrical power engineering domain
Author
Steele, J.A. ; McDonald, J.R. ; Arcy, C.D.
Author_Institution
Strathclyde Univ., Glasgow, UK
fYear
1997
fDate
35767
Firstpage
42583
Lastpage
42586
Abstract
Knowledge discovery in databases (KDD) is defined as the non trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data (W.J. Frawley et al., 1991). KDD is an iterative process involving five steps which lead to the final goal of useful information. The five steps are: selection of data-determining which fields and records are to be analysed; preprocessing-cleaning the data, by removal of noise and outliers, if appropriate, and deciding on strategies for missing attribute values; transformation-representing the data by new features, and reducing its dimensionality; data mining-deciding which algorithms to apply to the data i.e., classification, regression, rule induction, neural networks; and interpretation/evaluation-feasibility analysis of the results from the data mining step. There are two general `goals´ in KDD: verification of a hypothesis; and discovery, where the `system´ autonomously discovers patterns. Within the KDD process a data warehouse is typically employed as the `source´ of the KDD exercise. The power industry has evolved to become dependent upon computerised environments with more online data being stored for later extraction and investigation. Two key areas where KDD has been shown to be applicable is in the analysis of energy pooling and settlement data, and for condition monitoring of power system plant
Keywords
knowledge acquisition; KDD; computerised environments; condition monitoring; data mining; data warehouse; electrical power engineering domain; energy pooling; feasibility analysis; hypothesis verification; iterative process; knowledge discovery in databases; missing attribute values; online data storage; power industry; power system plant; preprocessing; rule induction; settlement data; understandable patterns;
fLanguage
English
Publisher
iet
Conference_Titel
IT Strategies for Information Overload (Digest No: 1997/340), IEE Colloquium on
Conference_Location
London
Type
conf
DOI
10.1049/ic:19971153
Filename
659910
Link To Document