• DocumentCode
    2714155
  • Title

    Predictive learning with sparse heterogeneous data

  • Author

    Cherkassky, Vladimir ; Cai, Feng ; Liang, Lichen

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Minnesota, Minneapolis, MN, USA
  • fYear
    2009
  • fDate
    14-19 June 2009
  • Firstpage
    544
  • Lastpage
    551
  • Abstract
    Many applications of machine learning involve sparse and heterogeneous data. For example, estimation of predictive (diagnostic) models using patients´ data from clinical studies requires effective integration of genetic, clinical and demographic data. Typically all heterogeneous inputs are properly encoded and mapped onto a single feature vector, used for estimating (training) a predictive model. This approach, known as standard inductive learning, is used in most application studies. More recently, several new learning methodologies have emerged. In particular, when training data can be naturally separated into several groups (or structured), we can view learning (estimation) for each group as a separate task, leading to multi-task learning framework. Similarly, a setting where training data is structured, but the objective is to estimate a single predictive model (for all groups), leads to learning with structured data and SVM+ methodology recently proposed by Vapnik. This paper demonstrates advantages and limitations of these new data modeling approaches for modeling heterogeneous data (relative to standard inductive SVM) via empirical comparisons using several publicly available medical data sets.
  • Keywords
    encoding; learning by example; medical diagnostic computing; statistical analysis; support vector machines; SVM; computer aided medical diagnostics; encoding; feature vector; machine learning; multitask learning framework; predictive learning; sparse heterogeneous data modeling; standard inductive learning; statistical analysis; Application software; Demography; Genetics; Machine learning; Medical diagnostic imaging; Neural networks; Predictive models; Probability; Support vector machines; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 2009. IJCNN 2009. International Joint Conference on
  • Conference_Location
    Atlanta, GA
  • ISSN
    1098-7576
  • Print_ISBN
    978-1-4244-3548-7
  • Electronic_ISBN
    1098-7576
  • Type

    conf

  • DOI
    10.1109/IJCNN.2009.5179036
  • Filename
    5179036