• DocumentCode
    3703577
  • Title

    Deep feature synthesis: Towards automating data science endeavors

  • Author

    James Max Kanter;Kalyan Veeramachaneni

  • Author_Institution
    CSAIL, MIT, Cambridge, MA - 02139
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    In this paper, we develop the Data Science Machine, which is able to derive predictive models from raw data automatically. To achieve this automation, we first propose and develop the Deep Feature Synthesis algorithm for automatically generating features for relational datasets. The algorithm follows relationships in the data to a base field, and then sequentially applies mathematical functions along that path to create the final feature. Second, we implement a generalizable machine learning pipeline and tune it using a novel Gaussian Copula process based approach. We entered the Data Science Machine in 3 data science competitions that featured 906 other data science teams. Our approach beats 615 teams in these data science competitions. In 2 of the 3 competitions we beat a majority of competitors, and in the third, we achieved 94% of the best competitor´s score. In the best case, with an ongoing competition, we beat 85.6% of the teams and achieved 95.7% of the top submissions score.
  • Keywords
    "Feature extraction","Predictive models","Machine learning algorithms","Prediction algorithms","Data models","Algorithm design and analysis","Data mining"
  • Publisher
    ieee
  • Conference_Titel
    Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on
  • Print_ISBN
    978-1-4673-8272-4
  • Type

    conf

  • DOI
    10.1109/DSAA.2015.7344858
  • Filename
    7344858