Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs

Abstract

Accurate and complete measurement of single nucleotide (SNP) and copy number (CNV) variants, both common and rare, will be required to understand the role of genetic variation in disease. We present Birdsuite, a four-stage analytical framework instantiated in software for deriving integrated and mutually consistent copy number and SNP genotypes. The method sequentially assigns copy number across regions of common copy number polymorphisms (CNPs), calls genotypes of SNPs, identifies rare CNVs via a hidden Markov model (HMM), and generates an integrated sequence and copy number genotype at every locus (for example, including genotypes such as A-null, AAB and BBB in addition to AA, AB and BB calls). Such genotypes more accurately depict the underlying sequence of each individual, reducing the rate of apparent mendelian inconsistencies. The Birdsuite software is applied here to data from the Affymetrix SNP 6.0 array. Additionally, we describe a method, implemented in PLINK, to utilize these combined SNP and CNV genotypes for association testing with a phenotype.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Overview of Birdsuite.
Figure 2: Schematic of how a CNP is processed through Canary illustrated with data from chromosome 4.
Figure 3: Schematic of how a single SNP is processed through Birdsuite and evaluation of mendelian inconsistencies.
Figure 4: Discovery of unknown or de novo copy number variation using Birdseye.

Similar content being viewed by others

References

  1. Rabbee, N. & Speed, T.P. A genotype calling algorithm for affymetrix SNP arrays. Bioinformatics 22, 7–12 (2006).

    Article  CAS  Google Scholar 

  2. Nicolae, D.L., Wu, X., Miyake, K. & Cox, N.J. GEL: a novel genotype calling algorithm using empirical likelihood. Bioinformatics 22, 1942–1947 (2006).

    Article  CAS  Google Scholar 

  3. McCarroll, S.A. et al. Common deletion polymorphisms in the human genome. Nat. Genet. 38, 86–92 (2006).

    Article  CAS  Google Scholar 

  4. Conrad, D.F., Andrews, T.D., Carter, N.P., Hurles, M.E. & Pritchard, J.K. A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 38, 75–81 (2006).

    Article  CAS  Google Scholar 

  5. McCarroll, S.A. & Altshuler, D.M. Copy-number variation and association studies of human disease. Nat. Genet. 39 (Suppl.), S37–S42 (2007).

    Article  CAS  Google Scholar 

  6. McCarroll, S.A. et al. Integrated detection and population-genetic analysis of SNPs and copy-number variation. Nat. Genet. advance online publication, 10.1038/ng.238 (7 September 2008).

  7. Komura, D. et al. Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays. Genome Res. 16, 1575–1584 (2006).

    Article  CAS  Google Scholar 

  8. Fiegler, H. et al. Accurate and reliable high-throughput detection of copy number variation in the human genome. Genome Res. 16, 1566–1574 (2006).

    Article  CAS  Google Scholar 

  9. Olshen, A.B., Venkatraman, E.S., Lucito, R. & Wigler, M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5, 557–572 (2004).

    Article  Google Scholar 

  10. Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).

    Article  CAS  Google Scholar 

  11. Bengtsson, H., Irizarry, R., Carvalho, B. & Speed, T.P. Estimation and assessment of raw copy numbers at the single locus level. Bioinformatics 24, 759–767 (2008); published online 19 January 2008.

    Article  CAS  Google Scholar 

  12. The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  13. Macconaill, L.E., Aldred, M.A., Lu, X. & Laframboise, T. Toward accurate high-throughput SNP genotyping in the presence of inherited copy number variation. BMC Genomics 8, 211 (2007).

    Article  Google Scholar 

  14. Dempster, A.P., Laird, N.M. & Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B 39, 1–38 (1977).

    Google Scholar 

  15. Viterbi, A.J. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Info Theory IT-13, 260–269 (1967).

    Article  Google Scholar 

  16. Sebat, J. et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007).

    Article  CAS  Google Scholar 

  17. Laframboise, T., Harrington, D. & Weir, B.A. PLASQ: a generalized linear model-based procedure to determine allelic dosage in cancer cells from SNP array data. Biostatistics 8, 323–336 (2007).

    Article  Google Scholar 

  18. Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

    Article  CAS  Google Scholar 

  19. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  Google Scholar 

  20. Clayton, D.G. et al. Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat. Genet. 37, 1243–1246 (2005).

    Article  CAS  Google Scholar 

  21. Weiss, L.A. et al. Association between microdeletion and microduplication at 16p11.2 and autism. N. Engl. J. Med. 358, 667–675 (2008); published online 9 January 2008.

    Article  CAS  Google Scholar 

  22. The International Schizophrenia Consortium. Rare chromosomal deletions and duplications increase risk of schizophrenia. Nature advance online publication, doi:10.1038/nature07239 (30 July 2008).

Download references

Acknowledgements

We wish to thank G. Getz for discussions on algorithms and comments regarding the supplemental methods. We also thank E. Lander and J. Hirschhorn for their readings and feedback. Finally, we are indebted to the testing labs that provided us with many replicates of HapMap samples run on the Affymetrix SNP 6.0 array. S.A.M. was supported by a Lilly Life Sciences Research Fellowship.

Author information

Authors and Affiliations

Authors

Contributions

J.M.K., F.G.K., S.A.M., M.J.D. and D.A. conceived of and refined the four-stage structure of Birdsuite. S.A.M., F.G.K. and J.N. developed and implemented Canary. J.N., S.A.M. and J.M.K. validated Canary calls, using data provided by P.J.C., J.V. and S.C. J.M.K., F.G.K., A.W., S.C. and E.H. developed, implemented, tested and validated Birdseed. J.M.K. developed, implemented and validated Birdseye. A.W. implemented Fawkes, which J.N., A.W. and J.M.K. validated. J.N., A.W., M.M.N. and S.B.G. were responsible for integration of the components and supporting software. K.D., C.L., J.M.K. and S.A.M. compared Birdsuite to Nexus and Partek. S.P. implemented the association tools. J.M.K., F.G.K., S.A.M., S.P., M.J.D. and D.A. wrote the manuscript. Discussion among all authors led to improvements in the algorithms and their implementations.

Corresponding authors

Correspondence to Joshua M Korn or David Altshuler.

Ethics declarations

Competing interests

S.C., E.H., J.V. and P.J.C. are employees of Affymetrix. The remaining authors (J.M.K., F.G.K., S.A.M., A.W., J.N., K.D., C.L., M.M.N., S.B.G., S.P., M.J.D. and D.A.) neither personally nor institutionally receive financial support from Affymetrix, and neither the authors nor their employers receive compensation or royalties from the work described in this article.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1 and 2, Supplementary Tables 1 and 2, Supplementary Note and Supplementary Methods (PDF 350 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Korn, J., Kuruvilla, F., McCarroll, S. et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet 40, 1253–1260 (2008). https://doi.org/10.1038/ng.237

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.237

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing