• DocumentCode
    2767769
  • Title

    Literature based Bayesian analysis of gene expression data

  • Author

    Xu, Lijing ; Homayouni, Ramin ; George, E. Olusegun

  • Author_Institution
    Bioinf. Program, Univ. of Memphis, Memphis, TN, USA
  • fYear
    2011
  • fDate
    12-15 Nov. 2011
  • Firstpage
    1032
  • Lastpage
    1032
  • Abstract
    Recent research has focused on incorporating biological function and pathway information into the analysis of gene expression data, partly as a means of compensating for insufficient experimental replications, low signal to noise, lack of reproducibility and/or multiple testing confounds. A Bayesian approach seems to be ideal for incorporating functional information into gene expression data analysis. In this study, we tested the feasibility of using literature derived gene relationships in a Bayesian model to analyze gene expression data. Prior distributions were constructed based on gene associations derived from the biomedical literature using Latent Semantic Indexing (LSI). The LSI model was built using more than 1 million Medline abstracts corresponding to 22,000 human and mouse genes. A key advantage of LSI is that both explicit and implicit gene relationships can be derived from the literature. Gene neighborhoods were determined using latent Gaussian Markov random fields and logistic transformation of the latent variables. We tested the procedure on a microarray dataset for interferon-stimulated genes in mouse embryonic fibroblasts. By integrating functional information from literature, Bayesian approach identified relevant genes that previously did not meet the 0.05 significance level. In comparison to a standard mixture model, spatial mixture model has more power for identifying direct and indirect interferon regulated genes. The spatial model enhanced the ranks of some genes which are known to be affected by interferon treatment, such as Nmi (NMI N-myc and STAT interactor) and ifi35 (interferon-induced protein 35). It also identified some genes that previously were ignored because of the marginal p-values, such as dpysl2, map2k1, msn, Psck5, and Il6st. Interestingly, these genes appear to be indirectly related to interferon treatment. In summary, we show that our procedure increases statistical power and produces more biologically meaningful gene lists. T- ese results suggest that Bayesian methods which incorporate functional information from the literature may improve analysis of gene expression data.
  • Keywords
    Bayes methods; Markov processes; bioinformatics; genetics; Il6st; Latent Semantic Indexing; Medline; Psck5; biological function; dpysl2; gene expression data; interferon stimulated genes; latent Gaussian Markov random fields; literature based Bayesian analysis; logistic transformation; map2k1; microarray dataset; mouse embryonic fibroblasts; msn; pathway information; reproducibility; spatial model; Bayesian methods; Bioinformatics; Educational institutions; Gene expression; Indexing; Large scale integration; Bayesian Modeling; Latent Semantic Indexing; Micorarray; Text-mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on
  • Conference_Location
    Atlanta, GA
  • Print_ISBN
    978-1-4577-1612-6
  • Type

    conf

  • DOI
    10.1109/BIBMW.2011.6112549
  • Filename
    6112549