Skip to main content

Quality Control for Genome-Wide Association Studies

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 628))

Abstract

This chapter is a comprehensive review of quality control (QC) methods for SNP-based genotyping panels used in genome-wide association studies. These include QC on individuals for missingness, gender checks, duplicates and cryptic relatedness, population outliers, heterozygosity and inbreeding, and QC on SNPs for missingness, minor allele frequency and Hardy-Weinberg equilibrium. The emphasis is on the reasons behind each QC step and on the use of intelligent approaches rather than arbitrary QC thresholds. Scripts and code for performing these QC steps are available at www.kcl.ac.uk/mmg/gwascode/.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Johnson, A.D. and O’Donnell, C.J. (2009) An open access database of genome-wide association results. BMC Med Genet, 10, 6.

    Article  PubMed  Google Scholar 

  2. Amos, C.I. (2007) Successful design and conduct of genome-wide association stu­dies. Hum Mol Genet, 16 Spec No. 2, R220-R225.

    Article  PubMed  CAS  Google Scholar 

  3. McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P. and Hirschhorn, J.N. (2008) Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet, 9, 356-369.

    Article  PubMed  CAS  Google Scholar 

  4. Neale, B.M. and Purcell, S. (2008) The positives, protocols, and perils of genome-wide association. Am J Med Genet B Neuropsychiatr Genet, 147B, 1288-1294.

    Article  PubMed  Google Scholar 

  5. Pearson, T.A. and Manolio, T.A. (2008) How to interpret a genome-wide association study. JAMA, 299, 1335-1344.

    Article  PubMed  CAS  Google Scholar 

  6. Teo, Y.Y. (2008) Common statistical issues in genome-wide association studies: a review on power, data quality control, genotype calling, and population structure. Curr Opin Lipidol, 19, 133-143.

    Article  PubMed  CAS  Google Scholar 

  7. Ziegler, A., Konig, I.R. and Thompson, J.R. (2008) Biostatistical aspects of genome-wide association studies. Biom J, 50, 8-28.

    Article  PubMed  Google Scholar 

  8. Zondervan, K.T. and Cardon, L.R. (2007) Designing candidate gene and genome-wide case-control association studies. Nat Protoc, 2, 2492-2501.

    Article  PubMed  CAS  Google Scholar 

  9. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D., et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet, 81, 559-575.

    Article  PubMed  CAS  Google Scholar 

  10. Plagnol, V., Cooper, J.D., Todd, J.A. and Clayton, D.G. (2007) A method to address differential bias in genotyping in large-scale association studies. PLoS Genet, 3, e74.

    Article  PubMed  Google Scholar 

  11. Aulchenko, Y.S., Ripke, S., Isaacs, A. and van Duijn, C.M. (2007) GenABEL: an R library for genome-wide association analysis. Bioinfor­matics, 23, 1294-1296.

    Article  PubMed  CAS  Google Scholar 

  12. Kang, H.M., Zaitlen, N.A., Wade, C.M., Kirby, A., Heckerman, D., Daly, M.J. and Eskin, E. (2008) Efficient control of population structure in model organism association mapping. Genetics, 178, 1709-1723.

    Article  PubMed  Google Scholar 

  13. Anderson, C.A., Pettersson, F.H., Barrett, J.C., Zhuang, J.J., Ragoussis, J., Cardon, L.R. and Morris, A.P. (2008) Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms. Am J Hum Genet, 83, 112-119.

    Article  PubMed  CAS  Google Scholar 

  14. Nothnagel, M., Ellinghaus, D., Schreiber, S., Krawczak, M. and Franke, A. (2009) A comprehensive evaluation of SNP genotype imputation. Hum Genet, 125, 163-171.

    Article  PubMed  CAS  Google Scholar 

  15. Pei, Y.F., Li, J., Zhang, L., Papasian, C.J. and Deng, H.W. (2008) Analyses and comparison of accuracy of different genotype imputation methods. PLoS One, 3, e3551.

    Article  PubMed  Google Scholar 

  16. Tian, C., Gregersen, P.K. and Seldin, M.F. (2008) Accounting for ancestry: population substructure and genome-wide association studies. Hum Mol Genet, 17, R143-R150.

    Article  PubMed  CAS  Google Scholar 

  17. Tiwari, H.K., Barnholtz-Sloan, J., Wineinger, N., Padilla, M.A., Vaughan, L.K. and Allison, D.B. (2008) Review and evaluation of methods correcting for population stratification with a focus on underlying statistical principles. Hum Hered, 66, 67-86.

    Article  PubMed  Google Scholar 

  18. The Wellcome Trust Case Control Consor­tium. (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447, 661-678.

    Article  Google Scholar 

  19. Giannoulatou, E., Yau, C., Colella, S., Ragoussis, J. and Holmes, C.C. (2008) GenoSNP: a variational Bayes within-sample SNP genotyping algorithm that does not require a reference population. Bioinformatics, 24, 2209-2214.

    Article  PubMed  CAS  Google Scholar 

  20. Lin, Y., Tseng, G.C., Cheong, S.Y., Bean, L.J., Sherman, S.L. and Feingold, E. (2008) Smarter clustering methods for SNP genotype calling. Bioinformatics, 24, 2665-2671.

    Article  PubMed  CAS  Google Scholar 

  21. Clayton, D.G., Walker, N.M., Smyth, D.J., Pask, R., Cooper, J.D., Maier, L.M., et al. (2005) Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat Genet, 37, 1243-1246.

    Article  PubMed  CAS  Google Scholar 

  22. Tian, C., Plenge, R.M., Ransom, M., Lee, A., Villoslada, P., Selmi, C., et al. (2008) Analysis and application of European genetic substructure using 300 K SNP information. PLoS Genet, 4, e4.

    Article  PubMed  Google Scholar 

  23. Price, A.L., Weale, M.E., Patterson, N., Myers, S.R., Need, A.C., Shianna, K.V., et al. (2008) Long-range LD can confound genome scans in admixed populations. Am J Hum Genet, 83, 132-135; author reply 135-139.

    Article  PubMed  CAS  Google Scholar 

  24. Patterson, N., Price, A.L. and Reich, D. (2006) Population structure and eigenanalysis. PLoS Genet, 2, e190.

    Article  PubMed  Google Scholar 

  25. Price, A.L., Patterson, N.J., Plenge, R.M., Weinblatt, M.E., Shadick, N.A. and Reich, D. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet, 38, 904-909.

    Article  PubMed  CAS  Google Scholar 

  26. Pritchard, J.K., Stephens, M. and Donnelly, P. (2000) Inference of population structure using multilocus genotype data. Genetics, 155, 945-959.

    PubMed  CAS  Google Scholar 

  27. Tang, H., Peng, J., Wang, P. and Risch, N.J. (2005) Estimation of individual admixture: analytical and study design considerations. Genet Epidemiol, 28, 289-301.

    Article  PubMed  Google Scholar 

  28. Wakefield, J. (2008) Reporting and interpretation in genome-wide association studies. Int J Epidemiol, 37, 641-653.

    Article  PubMed  Google Scholar 

  29. Wittke-Thompson, J.K., Pluzhnikov, A. and Cox, N.J. (2005) Rational inferences about departures from Hardy-Weinberg equilibrium. Am J Hum Genet, 76, 967-986.

    Article  PubMed  CAS  Google Scholar 

  30. Won, S. and Elston, R.C. (2008) The power of independent types of genetic informa­tion to detect association in a case-control study design. Genet Epidemiol, 32, 731-756.

    Article  PubMed  Google Scholar 

  31. Wigginton, J.E., Cutler, D.J. and Abecasis, G.R. (2005) A note on exact tests of Hardy-Weinberg equilibrium. Am J Hum Genet, 76, 887-893.

    Article  PubMed  CAS  Google Scholar 

  32. Leslie, S., Donnelly, P. and McVean, G. (2008) A statistical method for predicting classical HLA alleles from SNP data. Am J Hum Genet, 82, 48-56.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael E. Weale .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Weale, M.E. (2010). Quality Control for Genome-Wide Association Studies. In: Barnes, M., Breen, G. (eds) Genetic Variation. Methods in Molecular Biology, vol 628. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60327-367-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-367-1_19

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-60327-366-4

  • Online ISBN: 978-1-60327-367-1

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics