DocumentCode
2600380
Title
Segregation-based subspace clustering for huge dimensional data
Author
Alsagabi, Majid I. ; Tewfik, Ahmed H.
Author_Institution
Dept. of Electr. & Comput. Engi neering, Univ. of Minnesota, Minneapolis, MN, USA
fYear
2010
fDate
10-12 Nov. 2010
Firstpage
1
Lastpage
4
Abstract
Clustering algorithms break down when the data points fall in huge-dimensional spaces. To tackle this problem, many subspace clustering methods were proposed to build up a subspace where data points cluster efficiently. The bottom-up approach is used widely to select a set of candidate features, and then to use a portion of this set to build up the hidden subspace step by step. The complexity depends exponentially or cubically on the number of the selected features. In this paper, we present SEGCLU, a SEGregation-based subspace CLUstering method which significantly reduces the size of the candidate features´ set and has a cubic complexity. This algorithm was applied at noise-free data of DNA copy numbers of two groups of autistic and typically developing children to extract a potential bio-marker for autism. 85% of the individuals were classified correctly in a 13-dimensional subspace.
Keywords
DNA; bioinformatics; pattern clustering; 13-dimensional subspace; DNA copy numbers; SEGCLU; autism; autistic children; bottom-up approach; cubic complexity; hidden subspace; huge dimensional data; potential biomarker; segregation-based subspace clustering; typically developing children; Decision support systems;
fLanguage
English
Publisher
ieee
Conference_Titel
Genomic Signal Processing and Statistics (GENSIPS), 2010 IEEE International Workshop on
Conference_Location
Cold Spring Harbor, NY
ISSN
2150-3001
Print_ISBN
978-1-61284-791-7
Type
conf
DOI
10.1109/GENSIPS.2010.5719667
Filename
5719667
Link To Document