Title :
Analysis of Temporal High-Dimensional Gene Expression Data for Identifying Informative Biomarker Candidates
Author :
Qiang Lou ; Obradovic, Z.
Author_Institution :
Dept. of Comput. & Inf. Sci., Temple Univ., Philadelphia, PA, USA
Abstract :
Identifying informative biomarkers from a large pool of candidates is the key step for accurate prediction of an individual´s health status. In clinical applications traditional static feature selection methods that flatten the temporal data cannot be directly applied since the patient´s observed clinical condition is a temporal multivariate time series where different variables can capture various stages of temporal change in the patient´s health status. In this study, in order to identify informative genes in temporal microarray data, a margin based feature selection filter is proposed. The proposed method is based on well-established machine learning techniques without any assumptions about the data distribution. The objective function of temporal margin-based feature selection is defined to maximize each subject´s temporal margin in its own relevant subspace. In the objective function, the uncertainty in calculating nearest neighbors is taken into account by considering the change in feature weights in each iteration. A fixed-point gradient descent method is proposed to solve the formulated objective function. The experimental results on both synthetic and real data provide evidence that the proposed method can identify more informative features than the alternatives that flatten the temporal data in advance.
Keywords :
data analysis; genetics; gradient methods; medical computing; pattern recognition; clinical application; data distribution; fixed-point gradient descent method; individual health status prediction; informative biomarker candidate identification; iteration; machine learning; margin based feature selection filter; nearest neighbor; temporal high-dimensional gene expression data analysis; temporal margin-based feature selection; temporal microarray data; Gene expression; Linear programming; Optimization; Time measurement; Time series analysis; Uncertainty; Vectors; feature selection; high dimensional; margin; multivariate time series data; temporal data;
Conference_Titel :
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
Print_ISBN :
978-1-4673-4649-8
DOI :
10.1109/ICDM.2012.92