Abstract
This chapter is a comprehensive review of quality control (QC) methods for SNP-based genotyping panels used in genome-wide association studies. These include QC on individuals for missingness, gender checks, duplicates and cryptic relatedness, population outliers, heterozygosity and inbreeding, and QC on SNPs for missingness, minor allele frequency and Hardy-Weinberg equilibrium. The emphasis is on the reasons behind each QC step and on the use of intelligent approaches rather than arbitrary QC thresholds. Scripts and code for performing these QC steps are available at www.kcl.ac.uk/mmg/gwascode/.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Johnson, A.D. and O’Donnell, C.J. (2009) An open access database of genome-wide association results. BMC Med Genet, 10, 6.
Amos, C.I. (2007) Successful design and conduct of genome-wide association studies. Hum Mol Genet, 16 Spec No. 2, R220-R225.
McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P. and Hirschhorn, J.N. (2008) Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet, 9, 356-369.
Neale, B.M. and Purcell, S. (2008) The positives, protocols, and perils of genome-wide association. Am J Med Genet B Neuropsychiatr Genet, 147B, 1288-1294.
Pearson, T.A. and Manolio, T.A. (2008) How to interpret a genome-wide association study. JAMA, 299, 1335-1344.
Teo, Y.Y. (2008) Common statistical issues in genome-wide association studies: a review on power, data quality control, genotype calling, and population structure. Curr Opin Lipidol, 19, 133-143.
Ziegler, A., Konig, I.R. and Thompson, J.R. (2008) Biostatistical aspects of genome-wide association studies. Biom J, 50, 8-28.
Zondervan, K.T. and Cardon, L.R. (2007) Designing candidate gene and genome-wide case-control association studies. Nat Protoc, 2, 2492-2501.
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D., et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet, 81, 559-575.
Plagnol, V., Cooper, J.D., Todd, J.A. and Clayton, D.G. (2007) A method to address differential bias in genotyping in large-scale association studies. PLoS Genet, 3, e74.
Aulchenko, Y.S., Ripke, S., Isaacs, A. and van Duijn, C.M. (2007) GenABEL: an R library for genome-wide association analysis. Bioinformatics, 23, 1294-1296.
Kang, H.M., Zaitlen, N.A., Wade, C.M., Kirby, A., Heckerman, D., Daly, M.J. and Eskin, E. (2008) Efficient control of population structure in model organism association mapping. Genetics, 178, 1709-1723.
Anderson, C.A., Pettersson, F.H., Barrett, J.C., Zhuang, J.J., Ragoussis, J., Cardon, L.R. and Morris, A.P. (2008) Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms. Am J Hum Genet, 83, 112-119.
Nothnagel, M., Ellinghaus, D., Schreiber, S., Krawczak, M. and Franke, A. (2009) A comprehensive evaluation of SNP genotype imputation. Hum Genet, 125, 163-171.
Pei, Y.F., Li, J., Zhang, L., Papasian, C.J. and Deng, H.W. (2008) Analyses and comparison of accuracy of different genotype imputation methods. PLoS One, 3, e3551.
Tian, C., Gregersen, P.K. and Seldin, M.F. (2008) Accounting for ancestry: population substructure and genome-wide association studies. Hum Mol Genet, 17, R143-R150.
Tiwari, H.K., Barnholtz-Sloan, J., Wineinger, N., Padilla, M.A., Vaughan, L.K. and Allison, D.B. (2008) Review and evaluation of methods correcting for population stratification with a focus on underlying statistical principles. Hum Hered, 66, 67-86.
The Wellcome Trust Case Control Consortium. (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 447, 661-678.
Giannoulatou, E., Yau, C., Colella, S., Ragoussis, J. and Holmes, C.C. (2008) GenoSNP: a variational Bayes within-sample SNP genotyping algorithm that does not require a reference population. Bioinformatics, 24, 2209-2214.
Lin, Y., Tseng, G.C., Cheong, S.Y., Bean, L.J., Sherman, S.L. and Feingold, E. (2008) Smarter clustering methods for SNP genotype calling. Bioinformatics, 24, 2665-2671.
Clayton, D.G., Walker, N.M., Smyth, D.J., Pask, R., Cooper, J.D., Maier, L.M., et al. (2005) Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat Genet, 37, 1243-1246.
Tian, C., Plenge, R.M., Ransom, M., Lee, A., Villoslada, P., Selmi, C., et al. (2008) Analysis and application of European genetic substructure using 300 K SNP information. PLoS Genet, 4, e4.
Price, A.L., Weale, M.E., Patterson, N., Myers, S.R., Need, A.C., Shianna, K.V., et al. (2008) Long-range LD can confound genome scans in admixed populations. Am J Hum Genet, 83, 132-135; author reply 135-139.
Patterson, N., Price, A.L. and Reich, D. (2006) Population structure and eigenanalysis. PLoS Genet, 2, e190.
Price, A.L., Patterson, N.J., Plenge, R.M., Weinblatt, M.E., Shadick, N.A. and Reich, D. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet, 38, 904-909.
Pritchard, J.K., Stephens, M. and Donnelly, P. (2000) Inference of population structure using multilocus genotype data. Genetics, 155, 945-959.
Tang, H., Peng, J., Wang, P. and Risch, N.J. (2005) Estimation of individual admixture: analytical and study design considerations. Genet Epidemiol, 28, 289-301.
Wakefield, J. (2008) Reporting and interpretation in genome-wide association studies. Int J Epidemiol, 37, 641-653.
Wittke-Thompson, J.K., Pluzhnikov, A. and Cox, N.J. (2005) Rational inferences about departures from Hardy-Weinberg equilibrium. Am J Hum Genet, 76, 967-986.
Won, S. and Elston, R.C. (2008) The power of independent types of genetic information to detect association in a case-control study design. Genet Epidemiol, 32, 731-756.
Wigginton, J.E., Cutler, D.J. and Abecasis, G.R. (2005) A note on exact tests of Hardy-Weinberg equilibrium. Am J Hum Genet, 76, 887-893.
Leslie, S., Donnelly, P. and McVean, G. (2008) A statistical method for predicting classical HLA alleles from SNP data. Am J Hum Genet, 82, 48-56.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Weale, M.E. (2010). Quality Control for Genome-Wide Association Studies. In: Barnes, M., Breen, G. (eds) Genetic Variation. Methods in Molecular Biology, vol 628. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60327-367-1_19
Download citation
DOI: https://doi.org/10.1007/978-1-60327-367-1_19
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-60327-366-4
Online ISBN: 978-1-60327-367-1
eBook Packages: Springer Protocols