DocumentCode :
3661343
Title :
Metalearning to choose the level of analysis in nested data: A case study on error detection in foreign trade statistics
Author :
Mohammad Nozari Zarmehri;Carlos Soares
Author_Institution :
INESC Porto, Faculty of Engineering, University of Porto, Rua Dr. Roberto Frias, 378, 4200-465, Portugal
fYear :
2015
fDate :
7/1/2015 12:00:00 AM
Firstpage :
1
Lastpage :
8
Abstract :
Traditionally, a single model is developed for a data mining task. As more data is being collected at a more detailed level, organizations are becoming more interested in having specific models for distinct parts of data (e.g. customer segments). From the business perspective, data can be divided naturally into different dimensions. Each of these dimensions is usually hierarchically organized (e.g. country, city, zip code), which means that, when developing a model for a given part of the problem (e.g. a zip code) the training data may be collected at different levels of this nested hierarchy (e.g. the same zip code, the city and the country it is located in). Selecting different levels of granularity may change the performance of the whole process, so the question is which level to use for a given part. We propose a metalearning model which recommends a level of granularity for the training data to learn the model that is expected to obtain the best performance. We apply decision tree and random forest algorithms for metalearning. At the base level, our experiment uses results obtained by outlier detection methods on the problem of detecting errors in foreign trade transactions. The results show that using metalearning help finding the best level of granularity.
Keywords :
"Feature extraction","Databases"
Publisher :
ieee
Conference_Titel :
Neural Networks (IJCNN), 2015 International Joint Conference on
Electronic_ISBN :
2161-4407
Type :
conf
DOI :
10.1109/IJCNN.2015.7280656
Filename :
7280656
Link To Document :
بازگشت