• DocumentCode
    1797733
  • Title

    Imputation of missing data supported by Complete p-Partite attribute-based Decision Graphs

  • Author

    Bertini, J.R. ; do Carmo Nicoletti, Maria ; Liang Zhao

  • Author_Institution
    Comput. Sci. Dept., Univ. of Sao Paulo, Sao Paulo, Brazil
  • fYear
    2014
  • fDate
    6-11 July 2014
  • Firstpage
    1100
  • Lastpage
    1106
  • Abstract
    Missing attribute values is a recurrent problem in data mining and machine learning. Although there are plenty of techniques to handle this problem, most of them are too simplistic to provide a good estimation for absent attribute values. A very active research area focuses on solving the missing attribute value problem via imputation methods, which replaces missing data with substituted values. This paper proposes a new imputation method which uses a special graph named Complete p-Partite Attribute-based Decision Graphs (CpP-AbDG) to estimate, in a consistent and plausible way, the missing values. The graph is built by considering the range of each attribute that describes the data divided into sub-intervals; sub-intervals are approached as the vertices of a graph. Edges are then established between pairs of different vertices, provided they do not related to the same attribute. The edges and vertices are finally assigned a weight, based on distributions of the classes. The resulting CpP-AbDG has shown to be a suitable and informative data structure for finding the proper interval in which a missing attribute value should lie, taking into account all the attributes that describe the data. Results comparing the proposed approach to classical ones in an computational environment that considers classification problems as an evaluation criteria, show the potential of the method.
  • Keywords
    data mining; graph theory; learning (artificial intelligence); CpP-AbDG; complete p-partite attribute-based decision graphs; data mining; data structure; machine learning; missing attribute values; Algorithm design and analysis; Data models; Educational institutions; Electronic mail; Machine learning algorithms; Training; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), 2014 International Joint Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4799-6627-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.2014.6889593
  • Filename
    6889593