Estimating genotype error rates from high-coverage next-generation sequence data
- Jeffrey D. Wall1,2,
- Ling Fung Tang3,
- Brandon Zerbe2,
- Mark N. Kvale2,
- Pui-Yan Kwok2,3,
- Catherine Schaefer4 and
- Neil Risch1,2,4
- 1Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California 94143, USA;
- 2Institute for Human Genetics, University of California San Francisco, San Francisco, California 94143, USA;
- 3Cardiovascular Research Institute, University of California San Francisco, San Francisco, California 94143, USA;
- 4Kaiser Permanente Northern California Division of Research, Oakland, California 94612, USA
- Corresponding author: wallj{at}humgen.ucsf.edu
Abstract
Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on the platform and the depth of coverage. Additionally, we found (1) no difference in the error profiles or rates between blood and saliva samples; (2) Complete Genomics sequences had substantially higher error rates than Illumina sequences had; (3) error rates were higher (up to 6%) for rare or unique variants; (4) error rates generally declined with genotype quality (GQ) score, but in a nonlinear fashion for the Illumina data, likely due to loss of specificity of GQ scores greater than 60; and (5) error rates increased with increasing depth of coverage for the Illumina data. These findings, especially (3)–(5), suggest that caution should be taken in interpreting the results of next-generation sequencing-based association studies, and even more so in clinical application of this technology in the absence of validation by other more robust sequencing or genotyping methods.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.168393.113.
- Received October 15, 2013.
- Accepted August 25, 2014.
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.