• DocumentCode
    54157
  • Title

    A Partition-Based Framework for Building and Validating Regression Models

  • Author

    Muhlbacher, Thomas ; Piringer, Harald

  • Volume
    19
  • Issue
    12
  • fYear
    2013
  • fDate
    Dec. 2013
  • Firstpage
    1962
  • Lastpage
    1971
  • Abstract
    Regression models play a key role in many application domains for analyzing or predicting a quantitative dependent variable based on one or more independent variables. Automated approaches for building regression models are typically limited with respect to incorporating domain knowledge in the process of selecting input variables (also known as feature subset selection). Other limitations include the identification of local structures, transformations, and interactions between variables. The contribution of this paper is a framework for building regression models addressing these limitations. The framework combines a qualitative analysis of relationship structures by visualization and a quantification of relevance for ranking any number of features and pairs of features which may be categorical or continuous. A central aspect is the local approximation of the conditional target distribution by partitioning 1D and 2D feature domains into disjoint regions. This enables a visual investigation of local patterns and largely avoids structural assumptions for the quantitative ranking. We describe how the framework supports different tasks in model building (e.g., validation and comparison), and we present an interactive workflow for feature subset selection. A real-world case study illustrates the step-wise identification of a five-dimensional model for natural gas consumption. We also report feedback from domain experts after two months of deployment in the energy sector, indicating a significant effort reduction for building and improving regression models.
  • Keywords
    data visualisation; mathematics computing; regression analysis; solid modelling; application domains; domain knowledge; energy sector deployment; feature subset selection; five-dimensional model; input variable selection process; natural gas consumption; partition-based framework; regression model building; relevance quantification; relevance visualization; variable interaction; variable structure; variable transformation; Complexity theory; Computational modeling; Feature extraction; Frequency-domain analysis; Modeling; Regression analysis; Complexity theory; Computational modeling; Feature extraction; Frequency-domain analysis; Modeling; Regression; Regression analysis; data partitioning; feature selection; guided visualization; model building; visual knowledge discovery; Algorithms; Computer Graphics; Computer Simulation; Models, Statistical; Regression Analysis; Reproducibility of Results; Sensitivity and Specificity; User-Computer Interface;
  • fLanguage
    English
  • Journal_Title
    Visualization and Computer Graphics, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1077-2626
  • Type

    jour

  • DOI
    10.1109/TVCG.2013.125
  • Filename
    6634169