Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses

Abstract

We present PEER (probabilistic estimation of expression residuals), a software package implementing statistical models that improve the sensitivity and interpretability of genetic associations in population-scale expression data. This approach builds on factor analysis methods that infer broad variance components in the measurements. PEER takes as input transcript profiles and covariates from a set of individuals, and then outputs hidden factors that explain much of the expression variability. Optionally, these factors can be interpreted as pathway or transcription factor activations by providing prior information about which genes are involved in the pathway or targeted by the factor. The inferred factors are used in genetic association analyses. First, they are treated as additional covariates, and are included in the model to increase detection power for mapping expression traits. Second, they are analyzed as phenotypes themselves to understand the causes of global expression variability. PEER extends previous related surrogate variable models and can be implemented within hours on a desktop computer.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Protocol alternatives for applying PEER to analyses of expression QTL studies.
Figure 2: Illustrative analysis results of the application of PEER.
Figure 3: Illustrative analysis results of the application of PEER in supervised mode.

Similar content being viewed by others

References

  1. Stegle, O., Parts, L., Durbin, R. & Winn, J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol. 6, e1000770 (2010).

    Article  Google Scholar 

  2. Parts, L., Stegle, O., Winn, J. & Durbin, R. Joint genetic analysis of gene expression data with inferred cellular phenotypes. PLoS Genet. 7, e1001276 (2011).

    Article  CAS  Google Scholar 

  3. Brem, R.B., Yvert, G., Clinton, R. & Kruglyak, L. Genetic dissection of transcriptional regulation in budding yeast. Science 296, 752–755 (2002).

    Article  CAS  Google Scholar 

  4. Brem, R.B., Storey, J.D., Whittle, J. & Kruglyak, L. Genetic interactions between polymorphisms that affect gene expression in yeast. Nature 436, 701–703 (2005).

    Article  CAS  Google Scholar 

  5. Smith, E.N. & Kruglyak, L. Gene-environment interaction in yeast gene expression. PLoS Biol. 6, e83 (2008).

    Article  Google Scholar 

  6. Rockman, M.V. & Kruglyak, L. Genetics of global gene expression. Nat. Rev. Genet. 7, 862–872 (2006).

    Article  CAS  Google Scholar 

  7. Valdar, W. et al. Genome-wide genetic association of complex traits in heterogeneous stock mice. Nat. Genet. 38, 879–887 (2006).

    Article  CAS  Google Scholar 

  8. Doss, S., Schadt, E.E., Drake, T.A. & Lusis, A.J. Cis-acting expression quantitative trait loci in mice. Genome Res. 15, 681–691 (2005).

    Article  CAS  Google Scholar 

  9. Stranger, B.E. et al. Population genomics of human gene expression. Nat. Genet. 39, 1217–1224 (2007).

    Article  CAS  Google Scholar 

  10. Cheung, V.G. & Spielman, R.S. Genetics of human gene expression: mapping DNA variants that influence gene expression. Nat. Rev. Genet. 10, 595–604 (2009).

    Article  CAS  Google Scholar 

  11. Montgomery, S.B. et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010).

    Article  CAS  Google Scholar 

  12. Pickrell, J.K. et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772 (2010).

    Article  CAS  Google Scholar 

  13. Breitling, R. et al. Genetical genomics: spotlight on QTL hotspots. PLoS Genet. 4, e1000232 (2008).

    Article  Google Scholar 

  14. Franke, L. & Jansen, R.C. eQTL analysis in humans. Methods Mol. Biol. 573, 311–328 (2009).

    Article  CAS  Google Scholar 

  15. Lee, S.I. et al. Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification. Proc. Natl. Acad. Sci. USA 103, 14062–14067 (2006).

    Article  CAS  Google Scholar 

  16. Zhang, W., Zhu, J., Schadt, E.E. & Liu, J.S. A Bayesian partition method for detecting pleiotropic and epistatic eQTL modules. PLoS Comput. Biol. 6, e1000642 (2010).

    Article  Google Scholar 

  17. Balding, D.J. Handbook of Statistical Genetics. (Wiley-Interscience, 2007).

  18. Plagnol, V. et al. Extreme clonality in lymphoblastoid cell lines with implications for allele specific expression analyses. PLoS One 3, e2966 (2008).

    Article  Google Scholar 

  19. Leek, J.T. & Storey, J.D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, e161 (2007).

    Article  Google Scholar 

  20. Kang, H.M., Ye, C. & Eskin, E. Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Genetics 180, 1909–1925 (2008).

    Article  CAS  Google Scholar 

  21. Schadt, E.E. et al. An integrative genomics approach to infer causal associations between gene expression and disease. Nat. Genet. 37, 710–717 (2005).

    Article  CAS  Google Scholar 

  22. Small, K.S. et al. Identification of an imprinted master trans regulator at the KLF14 locus related to multiple metabolic phenotypes. Nat. Genet. 43, 561–564 (2011).

    Article  CAS  Google Scholar 

  23. 1000 Genomes Project Consortium, 1000 Genomes Project. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  24. Nica, A.C. et al. The Architecture of Gene Regulatory Variation across Multiple Human Tissues: The MuTHER Study. PLoS Genet. 7, e1002003 (2011).

    Article  CAS  Google Scholar 

  25. Huber, W. et al. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18 (suppl. 1), S96–S104 (2002).

    Article  Google Scholar 

  26. Pearson, R.D. et al. PUMA: a Bioconductor package for propagating uncertainty in microarray analysis. BMC Bionf. 10, 211 (2009).

    Article  Google Scholar 

  27. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).

    Article  CAS  Google Scholar 

  28. Rattray, M., Stegle, O., Sharp, K. & Winn, J. Inference algorithms and learning theory for Bayesian sparse factor analysis. J. Phys. Conf. Ser. 197, 012002 (2009).

    Article  Google Scholar 

  29. Broman, K.W., Wu, H., Sen, S. & Churchill, G.A. R/qtl: QTL mapping in experimental crosses. Bioinformatics 19, 889–890 (2003).

    Article  CAS  Google Scholar 

  30. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  Google Scholar 

  31. Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).

    Article  CAS  Google Scholar 

  32. Listgarten, J., Kadie, C., Schadt, E.E. & Heckerman, D. Correction for hidden confounders in the genetic analysis of gene expression. Proc. Natl. Acad. Sci. USA 107, 16465–16470 (2010).

    Article  CAS  Google Scholar 

  33. Biswas, S., Storey, J.D. & Akey, J.M. Mapping gene expression quantitative trait loci by singular value decomposition and independent component analysis. BMC Bioinf. 9, 244 (2008).

    Article  Google Scholar 

  34. Zhu, J. et al. Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nat. Genet. 40, 854–861 (2008).

    Article  CAS  Google Scholar 

  35. Aten, J.E., Fuller, T.F., Lusis, A.J. & Horvath, S. Using genetic markers to orient the edges in quantitative trait networks: the NEO software. BMC Sys. Biol. 2, 34 (2008).

    Article  Google Scholar 

  36. MacKay, D.J.C. Probable networks and plausible predictions-a review of practical Bayesian methods for supervised neural networks. Network 6, 469–505 (1995).

    Article  Google Scholar 

Download references

Acknowledgements

We thank R. Brem and L. Kruglyak for providing genotype and expression phenotype data to be included alongside this protocol. This work received financial support from the Wellcome Trust (grant no. WT077192/Z/05/Z) and the Technical Computing Initiative (Microsoft Research). O.S. received funding from the Volkswagen Foundation.

Author information

Authors and Affiliations

Authors

Contributions

O.S., L.P., J.W. and R.D. designed the probabilistic models underlying the protocol. O.S., L.P. and M.P. developed the PEER software suite. O.S. and L.P. wrote the paper with input from all authors.

Corresponding author

Correspondence to Oliver Stegle.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Data 1

Example dataset used to illustrate the protocol steps. (ZIP 2526 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stegle, O., Parts, L., Piipari, M. et al. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat Protoc 7, 500–507 (2012). https://doi.org/10.1038/nprot.2011.457

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nprot.2011.457

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing