Title :
Iterative PCA for population structure analysis
Author :
Limpiti, T. ; Intarapanich, A. ; Assawamakin, A. ; Wangkumhang, P. ; Tongsima, S.
Author_Institution :
Fac. of Eng., King Mongkut´´s Inst. of Technol. Ladkrabang, Bangkok, Thailand
Abstract :
An extension of principal component analysis called ip-PCA has been proposed earlier for analyzing structure in genetic data. This non-parametric framework iteratively classifies individuals into subpopulations. However, it is prone to false positives when dealing with large datasets and mixed-type genetic markers. We address these shortcomings by introducing a unified encoding scheme and suggesting a new terminating criterion for ipPCA. To validate the improvements, simulated datasets as well as real bovine and large human genetic datasets are analyzed. It is observed that the estimation of the number of subpopulations and the individual assignment accuracy have been improved. Furthermore, the structure resolved by this approach can be used to identify subset of individuals for further parametric population structure analysis.
Keywords :
DNA; demography; encoding; genetics; principal component analysis; signal processing; PCA; genetic marker; iterative PCA; population structure analysis; principal component analysis; unified encoding scheme; Bioinformatics; Clustering algorithms; Eigenvalues and eigenfunctions; Encoding; Genetics; Principal component analysis; Shape; PCA; SNP; Tracy-Widom; clustering; population structure;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location :
Prague
Print_ISBN :
978-1-4577-0538-0
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2011.5946474