DocumentCode :
589268
Title :
Studying the Effect of Class Imbalance in Ocean Turbine Fault Data on Reliable State Detection
Author :
Duhaney, J. ; Khoshgoftaar, Taghi M. ; Napolitano, Antonio
Author_Institution :
Comput. & Electr. Eng. & Comput. Sci., Florida Atlantic Univ., Boca Raton, FL, USA
Volume :
1
fYear :
2012
fDate :
12-15 Dec. 2012
Firstpage :
268
Lastpage :
275
Abstract :
Class imbalance is prevalent in many real world datasets. It occurs when there are significantly fewer examples in one or more classes in a dataset compared to the number of instances in the remaining classes. When trained on highly imbalanced datasets, traditional machine learning techniques can often simply ignore the minority class(es) and label all instances as being of the majority class to maximize accuracy. This problem has been studied in many domains but there is little or no research related to the effect of class imbalance in fault data for condition monitoring of an ocean turbine. This study makes the first efforts in bridging that gap by providing insight into how class imbalance in vibration data can impact a learner´s ability to reliably identify changes in the ocean turbine´s operational state. To do so, we empirically evaluate the performances of three popular, but very different, machine learning algorithms when trained on four datasets with varying class distributions (one balanced and three imbalanced) to distinguish between a normal and an abnormal state. All data used in this study were collected from the testbed for an ocean turbine and were under sampled to simulate the different levels of imbalance. We find here, as in other domains, that the three learners seemed to suffer overall when trained on data with a highly skewed class distribution (with 0.1% examples in a faulty/abnormal state while the remaining 99.9% were captured in a normal operational state). It was noted, however, that the Logistic Regression and Decision Tree classifiers performed better when only 5% of the total number of examples were representative of an abnormal state (the remaining 95% therefore indicating normal operation) than they did when there was no imbalance present.
Keywords :
condition monitoring; decision trees; fault diagnosis; hydraulic turbines; learning (artificial intelligence); mechanical engineering computing; optimisation; pattern classification; regression analysis; class imbalance; decision tree classifiers; highly imbalanced datasets; highly skewed class distribution; learner ability; logistic regression; machine learning techniques; ocean turbine condition monitoring; ocean turbine fault data; ocean turbine operational state; performance evaluation; real world datasets; reliable state detection; vibration data; Machine learning; Oceans; Reliability; Sensors; Training; Turbines; Vibrations; class imbalance; condition monitoring; ocean turbine; state detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications (ICMLA), 2012 11th International Conference on
Conference_Location :
Boca Raton, FL
Print_ISBN :
978-1-4673-4651-1
Type :
conf
DOI :
10.1109/ICMLA.2012.53
Filename :
6406674
Link To Document :
بازگشت