DocumentCode :
2037058
Title :
Intrinsic Dimensionality of Data and of their Representatives. A Case Study of Amino-Acid Distribution in ORFs
Author :
Bartkowiak, Anna ; Szustalewicz, Adam
Author_Institution :
Inst. of Comput. Sci., Univ. of Wroclaw, Wroclaw
fYear :
2008
fDate :
26-28 June 2008
Firstpage :
177
Lastpage :
182
Abstract :
By intrinsic dimensionality of a data set we mean the smallest number of base vectors which permit to reconstruct the considered set. Nowadays we obtain very huge data sets, which are computationally demanding. Therefore we look for some representative data vectors (prototypes) which might yield an insight into the data and be used for a (preliminary) data analysis. Let D of size n times d denote the observed data set, and D1 of size M times d the set of representatives of the data. Denote by r, the number of base vectors spanning D, and by r1 the number of base vectors spanning the data set D1 appropriately. Our questions: (1) Are r and r1 equal? (2) Say, we want to choose base vectors k and k1 approximating the sets D and D1 with a given accuracy. Are k and k1 equal? We answer these questions by considering the data set amino 569 containing frequency distribution of twenty amino-acids composing the ORFs in the 7th yeast chromosome. The answer is: twice NO.
Keywords :
biology computing; data analysis; amino-acid distribution; data analysis; data set; data vectors; frequency distribution; intrinsic dimensionality; Application software; Computer industry; Computer science; Frequency; Kernel; Linear discriminant analysis; Management information systems; Matrix decomposition; Principal component analysis; Prototypes; Amino-acid distribution in ORFs; Choice of representatives; Data reduction; Neural Gas; Self-organizing map;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Information Systems and Industrial Management Applications, 2008. CISIM '08. 7th
Conference_Location :
Ostrava
Print_ISBN :
978-0-7695-3184-7
Type :
conf
DOI :
10.1109/CISIM.2008.45
Filename :
4557857
Link To Document :
بازگشت