• DocumentCode
    507717
  • Title

    VisGBT: Visually analyzing evolving datasets for adaptive learning

  • Author

    Chen, Keke ; Tian, Fengguang

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Wright State Univ., Dayton, OH, USA
  • fYear
    2009
  • fDate
    11-14 Nov. 2009
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    Many machine learning problems involve changes in both feature distribution and label distribution, such as domain adaptation and learning drifting concepts from data streams. Correctly detecting, identifying, and understanding the changes of data distributions can help us properly select data samples or algorithms for learning models. However, since the training datasets are often in high dimensionality and large size, it has been difficult to effectively analyze them. Furthermore, the joint distribution between features and labels makes the problem more difficult to handle. In this paper, we propose a visual analysis method (VisGBT) that combines the gradient-boosting-trees (GBT) modeling method, regression analysis, and multidimensional visualization to capture the mismatches between datasets and models. The GBT model consists of a series of trees with a predefined number of terminal (leaf) nodes per tree. These terminal nodes partition the high dimensional space with a few most informative features to minimize the label prediction error. VisGBT maps various kinds of detailed model information to the terminal node matrix (TNM) and visualizes it with an appropriate design. With this visual analysis method, we can easily find out the detailed differences between datasets with the help of a learned model. We will illustrate the use of various visual patterns and in particular show how this method can help us analyze domain similarity for domain adaptation.
  • Keywords
    data visualisation; gradient methods; learning (artificial intelligence); matrix algebra; regression analysis; trees (mathematics); VisGBT; adaptive machine learning; data streams; domain adaptation; evolving training datasets; feature distribution; gradient-boosting-trees modeling method; label distribution; label prediction error; learning drifting concepts; multidimensional visualization; regression analysis; terminal node matrix; tree nodes; visual analysis method; visual patterns; Computer science; Costs; Data analysis; Data engineering; Data visualization; Machine learning; Machine learning algorithms; Multidimensional systems; Regression analysis; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Collaborative Computing: Networking, Applications and Worksharing, 2009. CollaborateCom 2009. 5th International Conference on
  • Conference_Location
    Washington, DC
  • Print_ISBN
    978-963-9799-76-9
  • Electronic_ISBN
    978-963-9799-76-9
  • Type

    conf

  • DOI
    10.4108/ICST.COLLABORATECOM2009.8281
  • Filename
    5362573