PennCNV: An integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data

  1. Kai Wang1,
  2. Mingyao Li2,
  3. Dexter Hadley1,3,
  4. Rui Liu1,
  5. Joseph Glessner4,
  6. Struan F.A. Grant4,
  7. Hakon Hakonarson4, and
  8. Maja Bucan1,5
  1. 1 Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA;
  2. 2 Department of Biostatistics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA;
  3. 3 Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA;
  4. 4 Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA

Abstract

Comprehensive identification and cataloging of copy number variations (CNVs) is required to provide a complete view of human genetic variation. The resolution of CNV detection in previous experimental designs has been limited to tens or hundreds of kilobases. Here we present PennCNV, a hidden Markov model (HMM) based approach, for kilobase-resolution detection of CNVs from Illumina high-density SNP genotyping data. This algorithm incorporates multiple sources of information, including total signal intensity and allelic intensity ratio at each SNP marker, the distance between neighboring SNPs, the allele frequency of SNPs, and the pedigree information where available. We applied PennCNV to genotyping data generated for 112 HapMap individuals; on average, we detected ∼27 CNVs for each individual with a median size of ∼12 kb. Excluding common rearrangements in lymphoblastoid cell lines, the fraction of CNVs in offspring not detected in parents (CNV-NDPs) was 3.3%. Our results demonstrate the feasibility of whole-genome fine-mapping of CNVs via high-density SNP genotyping.

Footnotes

  • 5 Corresponding author.

    5 E-mail bucan{at}pobox.upenn.edu; fax (215) 573-2041.

  • [Supplemental material is available online at www.genome.org. The PennCNV software is available from http://www.neurogenome.org/cnv/penncnv.]

  • Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6861907

  • 6 The AGRE Consortium: Dan Geschwind, UCLA, Los Angeles, CA; Maja Bucan, University of Pennsylvania, Philadelphia, PA; W. Ted Brown, N.Y.S. Institute for Basic Research in Developmental Disabilities, Staten Island, NY; Rita M. Cantor, UCLA School of Medicine, Los Angeles, CA; John N. Constantino, Washington University School of Medicine, St. Louis, MO; T. Conrad Gilliam, University of Chicago, Chicago, IL; Martha Herbert, Harvard Medical School, Boston, MA; Clara Lajonchere, Cure Autism Now/Autism Speaks, Los Angeles, CA; David H. Ledbetter, Emory University, Atlanta, GA; Christa Lese-Martin, Emory University, Atlanta, GA; Janet Miller, Cure Autism Now/Autism Speaks, Los Angeles, CA; Stanley F. Nelson, UCLA School of Medicine, Los Angeles, CA; Gerard D. Schellenberg, University of Washington, Seattle, WA; Carol A. Samango-Sprouse, George Washington University, Washington, DC; Sarah Spence, UCLA, Los Angeles, CA; Matthew State, Yale University, New Haven, CT; Rudolph E. Tanzi, Massachusetts General Hospital, Boston, MA.

    • Received June 29, 2007.
    • Accepted September 5, 2007.
| Table of Contents

Preprint Server