• DocumentCode
    1953891
  • Title

    Data mining: where do we start?

  • Author

    De Veaux, Richard D.

  • Author_Institution
    Dept. of Math. & Stat., Williams Coll., Williamstown, MA, USA
  • fYear
    2003
  • fDate
    16-19 June 2003
  • Firstpage
    19
  • Abstract
    Summary form only given. Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner (D. Hand (2001). Much exploratory data analysis (EDA) and inferential statistics concern the same problems. Part of the challenge of data mining is the sheer size of the data sets and/or the number of possible predictor variables. With 500 potential predictor variables, just summarizing them and graphing them to start the process is impossible. Instead, in data mining, we may start the process by creating a preliminary model just to narrow down the set of potential predictors. This exploratory data modeling (EDM) seems to be at odds with standard statistical practice, but, in fact, it is simply using models as a new exploratory tool. We take a brief tour of the current state of data mining algorithms and using several case studies explain how EDM can be easily used to narrow the search for a useful predictive model and to increase the chances of producing useful meaningful results.
  • Keywords
    data analysis; data mining; data models; very large databases; EDA; EDM; data mining; exploratory data analysis; exploratory data modeling; inferential statistics; large data sets; model selection; observational data sets; predictor variables; Data analysis; Data mining; Educational institutions; Electronic design automation and methodology; Information technology; Mathematics; Predictive models; Statistical analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology Interfaces, 2003. ITI 2003. Proceedings of the 25th International Conference on
  • ISSN
    1330-1012
  • Print_ISBN
    953-96769-6-7
  • Type

    conf

  • DOI
    10.1109/ITI.2003.1225315
  • Filename
    1225315