Title :
Solving the small sample size problem in protein subcellular localization prediction
Author :
Tong Wang ; Xiaoxia Cao ; Tian Xia ; Zhizhen Yang
Author_Institution :
Inst. of Comput. & Inf., Shanghai Second Polytech. Univ., Shanghai, China
Abstract :
In this paper, a new system is proposed to improve the performance of protein subcellular localization prediction. First of all, the protein sequences are quantized into a high dimension space using an effective sequence encoding scheme. However, the problem caused by such representation is small sample size problem, where the data dimension is much larger than the sample size. To sort out this problem, a new dimension reduction algorithm is introduced. It extracts the essential features from the high dimension feature space and does not suffer from small sample size problem. Then, an efficient classifier is employed to recognize the subcellular localization of proteins according to the new features after dimension reduction.
Keywords :
feature extraction; molecular biophysics; proteins; feature extraction; high dimension feature space; protein sequences; protein subcellular localization prediction; reduction algorithm; sequence encoding scheme; small sample size problem; manifold learning; prediction system; small sample size problem;
Conference_Titel :
Biomedical Engineering and Informatics (BMEI), 2012 5th International Conference on
Conference_Location :
Chongqing
Print_ISBN :
978-1-4673-1183-0
DOI :
10.1109/BMEI.2012.6513152