DocumentCode
1953891
Title
Data mining: where do we start?
Author
De Veaux, Richard D.
Author_Institution
Dept. of Math. & Stat., Williams Coll., Williamstown, MA, USA
fYear
2003
fDate
16-19 June 2003
Firstpage
19
Abstract
Summary form only given. Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner (D. Hand (2001). Much exploratory data analysis (EDA) and inferential statistics concern the same problems. Part of the challenge of data mining is the sheer size of the data sets and/or the number of possible predictor variables. With 500 potential predictor variables, just summarizing them and graphing them to start the process is impossible. Instead, in data mining, we may start the process by creating a preliminary model just to narrow down the set of potential predictors. This exploratory data modeling (EDM) seems to be at odds with standard statistical practice, but, in fact, it is simply using models as a new exploratory tool. We take a brief tour of the current state of data mining algorithms and using several case studies explain how EDM can be easily used to narrow the search for a useful predictive model and to increase the chances of producing useful meaningful results.
Keywords
data analysis; data mining; data models; very large databases; EDA; EDM; data mining; exploratory data analysis; exploratory data modeling; inferential statistics; large data sets; model selection; observational data sets; predictor variables; Data analysis; Data mining; Educational institutions; Electronic design automation and methodology; Information technology; Mathematics; Predictive models; Statistical analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology Interfaces, 2003. ITI 2003. Proceedings of the 25th International Conference on
ISSN
1330-1012
Print_ISBN
953-96769-6-7
Type
conf
DOI
10.1109/ITI.2003.1225315
Filename
1225315
Link To Document