DocumentCode :
3703532
Title :
Selecting representative instances from datasets
Author :
Seyed Hamid Mirisaee;Ahlame Douzal;Alexandre Termier
Author_Institution :
Univ. Grenoble Alps, Grenoble, France
fYear :
2015
Firstpage :
1
Lastpage :
10
Abstract :
We propose in this paper a new, alternative approach for the problem of finding a set of representative objects in large datasets. To do so, we first formulate the general Instance Selection Problem (ISP) and then study three variants of that in order to select instances from different regions of the data. These variants aim at finding the objects located in three very different locations of the data: the inner frontier, the central area and the outer frontier. Solutions to these problems have been discussed and their complexities have been studied. To illustrate the effectiveness of the proposed techniques, we first use a small, synthetic dataset for visualization purpose. We then study them on the Reuters dataset and show that the integration of instances selected by the ISP techniques is able to provide a good representation of the data and can be considered as a complementary approach for the state-of-the-art methods. Finally, we examine the quality of the selected objects by applying a topic-based analysis in order to show how well the selected documents cover the topics in the Reuters dataset.
Keywords :
"Approximation methods","Complexity theory","Matrix decomposition","Data mining","Principal component analysis","Data visualization","Data analysis"
Publisher :
ieee
Conference_Titel :
Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on
Print_ISBN :
978-1-4673-8272-4
Type :
conf
DOI :
10.1109/DSAA.2015.7344812
Filename :
7344812
Link To Document :
بازگشت