• DocumentCode
    2222534
  • Title

    Data acquisition with active and impact-sensitive instance selection

  • Author

    Zhu, Xingquan ; Wu, Xindong

  • Author_Institution
    Dept. of Comput. Sci., Vermont Univ., Burlington, VT, USA
  • fYear
    2004
  • fDate
    15-17 Nov. 2004
  • Firstpage
    721
  • Lastpage
    726
  • Abstract
    Real-world data is never perfect and can often suffer from corruptions or missing values that may impact models created from the data. To build accurate predictive models, data acquisition is usually adopted to complete missing values in the incomplete instances. Due to the significant cost of doing so and the inherent correlations in the dataset, acquiring complete information for all instances is likely prohibitive and unnecessary. An interesting and important problem raises here is to select what kind of instances to complete so the model built from the data can receive significant improvement. We propose two solutions to resolve this problem, and the essential idea is to complete the attributes with higher impacts to the system performance. The first solution is based on an impact-sensitive instance ranking mechanism [X. Zhu et al. (2004)]. We explore the correlation between attributes and the class and use the correlation as weights of the attributes; the larger the weight, the higher the impacts of the attribute. For each incomplete instance, we sum all weights of the attributes with missing values, and the instance with larger sum appears to be more important for users to complete their missing information. In the second solution, active learning, impact-sensitive instance ranking and missing value prediction are combined for data acquisition. Experimental results from real-world datasets demonstrate the effectiveness of our strategies.
  • Keywords
    correlation methods; data acquisition; data mining; data models; learning (artificial intelligence); pattern classification; statistical analysis; active learning; data acquisition; instance ranking mechanism; missing attribute value prediction; real-world datasets; Bayesian methods; Computer science; Costs; Data acquisition; Data mining; Filling; Predictive models; Statistics; System performance; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence, 2004. ICTAI 2004. 16th IEEE International Conference on
  • ISSN
    1082-3409
  • Print_ISBN
    0-7695-2236-X
  • Type

    conf

  • DOI
    10.1109/ICTAI.2004.46
  • Filename
    1374260