DocumentCode :
3746256
Title :
Tutorial IV computational intelligence for data analytics
Author :
Ivan Jordanov
Author_Institution :
School of Computing, University of Portsmouth, UK
fYear :
2015
Firstpage :
35
Lastpage :
35
Abstract :
Humankind has been collecting data since the recording started, but in the last decade with the considerable advances in computing and storage technologies, advancements of cloud computing, development of ubiquitous connectivity and the internet of things, there has been explosion in the size and variety of collected data. Nevertheless, one can be data-rich and knowledge-poor, and this is where the data analytics and the development and application of machine learning models become necessity for gaining insight of complex processes to prove scientific theories and discoveries, support decision making and enhance strategic planning in different areas of the economy, finance, industry, healthcare, etc. Recently, there is an influx of polymorphic, unstructured and multimodal data - social media, images, audio, video, etc., which is complicating further the data processing and knowledge extraction process. But even the traditional structured datasets present problems that need to be addressed and overcome in the early stages of data pre-processing, feature extraction and feature selection. This is because they usually contain variety of data formats, e.g., categorical, continuous, ordinal, and frequently missing data (usually result of sensors faults, human errors, collection, transportation, or storage problems). The most popular approaches in dealing with missing data generally fall in three groups: Deletion methods; Single imputation methods; and Model-based methods. In this tutorial I will talk about the third group methods, which are considered to be the most popular, ´modem´ model-based approaches. Particularly, Multiple imputation (MI) method will be introduced and discussed in addition to the K-Nearest Neighbour Imputation (KNN-I) and Bagged Tree Imputation (BTI). Subsequently, MI, KNN-I and BTl will be applied in a case study for pre-processing a real world radar signal large dataset (more than 30 000 samples). The dataset comprises intercepted and collected pulse train characteristics, which typically include signal frequencies, type of modulation, scan period, pulse repetition intervals, etc., and usually consist of mixture of continuous, discrete and categorical data, and also frequently include missing values. Missing values are imminent part of real world datasets and radar datasets make no exception of that. Then will briefly talk about supervised and unsupervised learning and the use of three supervised approaches: Neural Networks (NN); Random Forests (RF); and Support Vector Machines (SVM) for solving radar signal classification and source identification problem. Results from applying the NN, RF and SVM (using R and Matlab) on complete data subset (without missing data) and the full dataset with substituted (up to 60%) missing data with MI, KNN-I and BTl will be critically analysed and discussed. Finally, I´ll talk about the opportunities and challenges in applying computational intelligence and machine learning techniques to Big Data and the available software for Big Data.
Keywords :
Transportation
Publisher :
ieee
Conference_Titel :
Technologies and Applications of Artificial Intelligence (TAAI), 2015 Conference on
Electronic_ISBN :
2376-6824
Type :
conf
DOI :
10.1109/TAAI.2015.7407137
Filename :
7407137
Link To Document :
بازگشت