Title :
MOGT: Oversampling with a parsimonious mixture of Gaussian trees model for imbalanced time-series classification
Author :
Pang, John Z. F. ; Hong Cao ; Tan, Vincent Y. F.
Author_Institution :
Sch. of Phys. & Math. Sci., Nanyang Technol. Univ., Singapore, Singapore
Abstract :
We propose a novel framework of using a parsimonious statistical model, known as mixture of Gaussian trees, for modelling the possibly multi-modal minority class to solve the problem of imbalanced time-series binary classification. By exploiting the fact that close-by time points are highly correlated, our model significantly reduces the number of covariance parameters to be estimated from O(d2) to O(Ld), L denotes the number of mixture components and d is the dimension. Thus our model is particularly effective for modelling high-dimensional time-series with limited number of instances in the minority positive class. We conduct extensive classification experiments based on several well-known time-series datasets (both single-and multi-modal) by first randomly generating synthetic instances from our learned mixture model to correct the imbalance. We then compare our results to several state-of-the-art oversampling techniques and the results demonstrate that when our proposed model is used, the same support vector machines classifier achieves much better classification accuracy across the range of datasets. In fact, the proposed method achieves the best average performance 27 times out of 30 multi-modal datasets according to the F-value metric.
Keywords :
Gaussian processes; covariance analysis; pattern classification; sampling methods; support vector machines; time series; trees (mathematics); F-value metric; MOGT; classification accuracy; close-by time points; high-dimensional time-series; imbalanced time-series binary classification; learned mixture model; minority positive class; mixture-of-Gaussian trees model; multimodal minority class; oversampling techniques; parsimonious statistical model; randomly generating synthetic instances; support vector machines classifier; Computational modeling; Covariance matrices; Data models; Graphical models; Markov processes; Random variables; Vectors; Gaussian graphical models; Imbalanced dataset; Mixture models; Multi-modality; Oversampling; Time-series;
Conference_Titel :
Machine Learning for Signal Processing (MLSP), 2013 IEEE International Workshop on
Conference_Location :
Southampton
DOI :
10.1109/MLSP.2013.6661937