Data mining: where do we start?

Author

De Veaux, Richard D.

Author_Institution

Dept. of Math. & Stat., Williams Coll., Williamstown, MA, USA

fYear

2003

fDate

16-19 June 2003

Firstpage

19

Abstract

Summary form only given. Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner (D. Hand (2001). Much exploratory data analysis (EDA) and inferential statistics concern the same problems. Part of the challenge of data mining is the sheer size of the data sets and/or the number of possible predictor variables. With 500 potential predictor variables, just summarizing them and graphing them to start the process is impossible. Instead, in data mining, we may start the process by creating a preliminary model just to narrow down the set of potential predictors. This exploratory data modeling (EDM) seems to be at odds with standard statistical practice, but, in fact, it is simply using models as a new exploratory tool. We take a brief tour of the current state of data mining algorithms and using several case studies explain how EDM can be easily used to narrow the search for a useful predictive model and to increase the chances of producing useful meaningful results.

Keywords

data analysis; data mining; data models; very large databases; EDA; EDM; data mining; exploratory data analysis; exploratory data modeling; inferential statistics; large data sets; model selection; observational data sets; predictor variables; Data analysis; Data mining; Educational institutions; Electronic design automation and methodology; Information technology; Mathematics; Predictive models; Statistical analysis;

fLanguage

English

Publisher

ieee

Conference_Titel

Information Technology Interfaces, 2003. ITI 2003. Proceedings of the 25th International Conference on

ISSN

1330-1012

Print_ISBN

953-96769-6-7

Type

conf

DOI

10.1109/ITI.2003.1225315

Filename

1225315