• DocumentCode
    3006862
  • Title

    Learning Classifiers from Distributional Data

  • Author

    Lin, H.T. ; Sanghack Lee ; Bui, Nicola ; Honavar, V.

  • Author_Institution
    Dept. of Comput. Sci., Iowa State Univ., Ames, IA, USA
  • fYear
    2013
  • fDate
    June 27 2013-July 2 2013
  • Firstpage
    302
  • Lastpage
    309
  • Abstract
    Many big data applications give rise to distributional data wherein objects or individuals are naturally represented as K-tuples of bags of feature values where feature values in each bag are sampled from a feature and object specific distribution. We formulate and solve the problem of learning classifiers from distributional data. We consider three classes of methods for learning distributional classifiers: (i) those that rely on aggregation to encode distributional data into tuples of attribute values, i.e., instances that can be handled by traditional supervised machine learning algorithms, (ii) those that are based on generative models of distributional data, and (iii) the discriminative counterparts of the generative models considered in (ii) above. We compare the performance of the different algorithms on real-world as well as synthetic distributional data sets. The results of our experiments demonstrate that classifiers that take advantage of the information available in the distributional instance representation outperform or match the performance of those that fail to fully exploit such information.
  • Keywords
    data handling; learning (artificial intelligence); pattern classification; K-tuples representation; distributional data; feature values; generative models; learning distributional classifiers; object specific distribution; supervised machine learning algorithms; Accuracy; Data models; Electronics packaging; Machine learning algorithms; Mathematical model; Standards; Vectors; classifier; distributional data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2013 IEEE International Congress on
  • Conference_Location
    Santa Clara, CA
  • Print_ISBN
    978-0-7695-5006-0
  • Type

    conf

  • DOI
    10.1109/BigData.Congress.2013.47
  • Filename
    6597151