Title :
Pathogen host interaction prediction via matrix factorization
Author :
Li, Benjamin Y. S. ; Lam Fat Yeung ; Genke Yang
Author_Institution :
Dept. of Electron. Eng., City Univ. of Hong Kong, Hong Kong, China
Abstract :
One of the goals in the study of infectious disease is to construct a reliable predictive model on the pathogen-host interactome. Conventional methods on the construction of model consider the problem as a binary classification problem. However, most databases only consist of detected interactions and lack of negative results. Thus, as compare to binary classification, this situation is closer to the collaborative filtering problem in nature. In this paper, a commonly used collaborative filtering technique, matrix factorization is applied on the prediction of pathogen-host interaction. However, in matrix factorization, estimation of latent variables is highly dependent on the completeness of the dataset. If the dataset is incomplete, due to the lack of information, estimation of some latent vectors may be infeasible. To relieve this issue, an extension of probabilistic matrix factorization is proposed in this paper. In the extended model, similarities between objects are taken into account as a basis of estimation. Experiment results have shown that when the sparsity increases, as compare to the conventional matrix factorization model and the probabilistic based matrix factorization model, the similarity based probabilistic matrix factorization model has the best goodness of fit and a high prediction accuracy.
Keywords :
diseases; matrix decomposition; medical computing; pattern classification; probability; binary classification problem; collaborative filtering technique; conventional matrix factorization model; conventional methods; databases; extended model; infectious disease; latent variable estimation; latent vectors; pathogen-host interaction prediction; pathogen-host interactome; prediction accuracy; reliable predictive model; similarity based probabilistic matrix factorization model; sparsity; Estimation; Pathogens; Predictive models; Probabilistic logic; Proteins; Sparse matrices; Vectors;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
Conference_Location :
Belfast
DOI :
10.1109/BIBM.2014.6999185