• DocumentCode
    249479
  • Title

    An Effective Integrated Method for Learning Big Imbalanced Data

  • Author

    Ghanavati, Mojgan ; Wong, Raymond K. ; Fang Chen ; Yang Wang ; Chang-Shing Perng

  • Author_Institution
    Sch. of Comput. Sci. & Eng., Univ. of New South Wales, Sydney, NSW, Australia
  • fYear
    2014
  • fDate
    June 27 2014-July 2 2014
  • Firstpage
    691
  • Lastpage
    698
  • Abstract
    The imbalance of data has great effects on the performance of learning algorithms due to the presence of under-represented data and severe class distribution skews. This is one of the new challenges of machine learning data mining. Choosing a suitable metric that addresses the properties and domain characteristics of learning real-world data is critical for achieving a good result in most of machine learning and data mining algorithms. When the dataset is big and imbalanced, even with an accurate metric, it is extremely difficult to achieve good learning performance. This paper proposes an integrated method for learning large imbalanced datasets. In particular, a combination of metric learning algorithms and balancing techniques are experimented. Their performances are compared based on a set of evaluation metrics running on bootstrap datasets of different sizes. The best combination is then selected for learning the full imbalanced datasets. Experiments using the water pipeline datasets collected from various Australia regions in the past two decades show that our proposed method is both practical and effective.
  • Keywords
    data mining; learning (artificial intelligence); balancing techniques; bootstrap datasets; evaluation metrics; large imbalanced dataset learning; machine learning data mining; metric learning algorithms; Clustering algorithms; Equations; Error analysis; Learning systems; Mathematical model; Measurement; Training; Classification; Imbalanced data; Large Margin Nearest Neighbour; Metric Learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2014 IEEE International Congress on
  • Conference_Location
    Anchorage, AK
  • Print_ISBN
    978-1-4799-5056-0
  • Type

    conf

  • DOI
    10.1109/BigData.Congress.2014.102
  • Filename
    6906846