M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity

Jagadeesh, Karthik A; Wenger, Aaron M; Berger, Mark J; Guturu, Harendra; Stenson, Peter D; Cooper, David N; Bernstein, Jonathan A; Bejerano, Gill

doi:10.1038/ng.3703

Technical Report
Published: 24 October 2016

M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity

Karthik A Jagadeesh¹^na1,
Aaron M Wenger²^na1,
Mark J Berger¹,
Harendra Guturu²,
Peter D Stenson³,
David N Cooper ORCID: orcid.org/0000-0002-8943-8484³,
Jonathan A Bernstein² &
…
Gill Bejerano^1,2,4

Nature Genetics volume 48, pages 1581–1586 (2016)Cite this article

12k Accesses
488 Citations
44 Altmetric
Metrics details

Subjects

Abstract

Variant pathogenicity classifiers such as SIFT, PolyPhen-2, CADD, and MetaLR assist in interpretation of the hundreds of rare, missense variants in the typical patient genome by deprioritizing some variants as likely benign. These widely used methods misclassify 26 to 38% of known pathogenic mutations, which could lead to missed diagnoses if the classifiers are trusted as definitive in a clinical setting. We developed M-CAP, a clinical pathogenicity classifier that outperforms existing methods at all thresholds and correctly dismisses 60% of rare, missense variants of uncertain significance in a typical genome at 95% sensitivity.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: M-CAP outperforms existing pathogenicity likelihood metrics, particularly at the high sensitivity levels required for clinical applications.**

**Figure 2: M-CAP correctly eliminates the most variants of uncertain consequences as benign at 95% sensitivity.**

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Wenpin Hou & Zhicheng Ji

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Saori Sakaue, Kathryn Weinand, … Soumya Raychaudhuri

References

Yang, Y. et al. Clinical whole-exome sequencing for the diagnosis of Mendelian disorders. N. Engl. J. Med. 369, 1502–1511 (2013).
Article CAS PubMed PubMed Central Google Scholar
Iglesias, A. et al. The usefulness of whole-exome sequencing in routine clinical practice. Genet. Med. 16, 922–931 (2014).
Article PubMed Google Scholar
Lee, H. et al. Clinical exome sequencing for genetic identification of rare Mendelian disorders. J. Am. Med. Assoc. 312, 1880–1887 (2014).
Article Google Scholar
Brownstein, C.A. et al. An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge. Genome Biol. 15, R53 (2014).
Article PubMed PubMed Central Google Scholar
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
Article PubMed PubMed Central Google Scholar
Ng, S.B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009).
Article CAS PubMed PubMed Central Google Scholar
Simpson, M.A. et al. Mutations in NOTCH2 cause Hajdu–Cheney syndrome, a disorder of severe and progressive bone loss. Nat. Genet. 43, 303–305 (2011).
Article CAS PubMed Google Scholar
Ng, S.B. et al. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 42, 30–35 (2010).
Article CAS PubMed Google Scholar
Taylor, J.C. et al. Factors influencing success of clinical genome sequencing across a broad spectrum of disorders. Nat. Genet. 47, 717–726 (2015).
Article CAS PubMed PubMed Central Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article CAS PubMed PubMed Central Google Scholar
1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Rehm, H.L. et al. ACMG clinical laboratory standards for next-generation sequencing. Genet. Med. 15, 733–747 (2013).
Article PubMed PubMed Central Google Scholar
Ng, P.C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).
Article CAS PubMed PubMed Central Google Scholar
Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).
Article CAS PubMed PubMed Central Google Scholar
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Article CAS PubMed PubMed Central Google Scholar
Dong, C. et al. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Hum. Mol. Genet. 24, 2125–2137 (2015).
Article CAS PubMed Google Scholar
Hastie, T., Tibshirani, R. & Friedman, J. Elements of Statistical Learning (Springer, 2003).
Fusi, N., Smith, I., Doench, J. & Listgarten, J. In silico predictive modeling of CRISPR/Cas9 guide efficiency. Preprint at bioRxiv http://dx.doi.org/10.1101/021568 (2015).
Ogutu, J.O., Piepho, H.-P. & Schulz-Streeck, T. A comparison of random forests, boosting and support vector machines for genomic selection. BMC Proc. 5 (Suppl. 3), S11 (2011).
Article PubMed PubMed Central Google Scholar
Schwarz, J.M., Cooper, D.N., Schuelke, M. & Seelow, D. MutationTaster2: mutation prediction for the deep-sequencing age. Nat. Methods 11, 361–362 (2014).
Article CAS PubMed Google Scholar
Reva, B., Antipin, Y. & Sander, C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 39, e118 (2011).
Article CAS PubMed PubMed Central Google Scholar
Shihab, H.A. et al. Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum. Mutat. 34, 57–65 (2013).
Article CAS PubMed Google Scholar
Chun, S. & Fay, J.C. Identification of deleterious mutations within three human genomes. Genome Res. 19, 1553–1561 (2009).
Article CAS PubMed PubMed Central Google Scholar
Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S. & Goldstein, D.B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).
CAS PubMed PubMed Central Google Scholar
Pollard, K.S., Hubisz, M.J., Rosenbloom, K.R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).
Article CAS PubMed PubMed Central Google Scholar
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).
Article CAS PubMed PubMed Central Google Scholar
Henikoff, S. & Henikoff, J.G. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992).
Article CAS PubMed PubMed Central Google Scholar
Garber, M. et al. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 25, i54–i62 (2009).
Article CAS PubMed PubMed Central Google Scholar
Davydov, E.V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).
Article PubMed PubMed Central Google Scholar
Kuhn, R.M., Haussler, D. & Kent, W.J. The UCSC genome browser and associated tools. Brief. Bioinform. 14, 144–161 (2013).
Article CAS PubMed Google Scholar
Stenson, P.D. et al. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum. Genet. 133, 1–9 (2014).
Article CAS PubMed Google Scholar
Ionita-Laza, I., McCallum, K., Xu, B. & Buxbaum, J.D. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat. Genet. 48, 214–220 (2016).
Article CAS PubMed PubMed Central Google Scholar
UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2015).
Yang, Y. et al. Molecular findings among patients referred for clinical whole-exome sequencing. J. Am. Med. Assoc. 312, 1870–1879 (2014).
Article CAS Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Article PubMed PubMed Central Google Scholar
Liu, X., Jian, X. & Boerwinkle, E. dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum. Mutat. 32, 894–899 (2011).
Article CAS PubMed PubMed Central Google Scholar
Liu, X., Jian, X. & Boerwinkle, E. dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations. Hum. Mutat. 34, E2393–E2402 (2013).
Article CAS PubMed PubMed Central Google Scholar
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).
Article PubMed PubMed Central Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank the members of the Bejerano laboartory, particularly J. Notwell, S. Chinchali, and J. Birgmeier, for technical advice and helpful discussions. P.D.S. and D.N.C. receive financial support from Qiagen through a license agreement with Cardiff University. We thank the PolyPhen-2, CADD, Eigen, FATHMM, MutationTaster, and MetaLR teams for making their training and testing data readily available. This work was funded in part by the Stanford Pediatrics Department, DARPA, a Packard Foundation Fellowship, and a Microsoft Faculty Fellowship to G.B.

Author information

Karthik A Jagadeesh and Aaron M Wenger: These authors contributed equally to this work.

Authors and Affiliations

Department of Computer Science, Stanford University, Stanford, California, USA
Karthik A Jagadeesh, Mark J Berger & Gill Bejerano
Department of Pediatrics, Stanford University, Stanford, California, USA
Aaron M Wenger, Harendra Guturu, Jonathan A Bernstein & Gill Bejerano
Department of Medical Genetics, Cardiff University, Heath Park, Cardiff, UK
Peter D Stenson & David N Cooper
Department of Developmental Biology, Stanford University, Stanford, California, USA
Gill Bejerano

Authors

Karthik A Jagadeesh
View author publications
You can also search for this author in PubMed Google Scholar
Aaron M Wenger
View author publications
You can also search for this author in PubMed Google Scholar
Mark J Berger
View author publications
You can also search for this author in PubMed Google Scholar
Harendra Guturu
View author publications
You can also search for this author in PubMed Google Scholar
Peter D Stenson
View author publications
You can also search for this author in PubMed Google Scholar
David N Cooper
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan A Bernstein
View author publications
You can also search for this author in PubMed Google Scholar
Gill Bejerano
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

K.A.J., A.M.W., M.J.B., and G.B. designed the study and analyzed results. K.A.J. and M.J.B. implemented the model and performed the experiments. K.A.J., A.M.W., and H.G. wrote software tools that were used for analysis. P.D.S. and D.N.C. curated the HGMD data and provided feedback. J.A.B. provided patient exome cases and feedback. K.A.J., A.M.W., and G.B. wrote the manuscript. All authors reviewed and commented on the manuscript.

Corresponding author

Correspondence to Gill Bejerano.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Tables 1–4, 6, 9 and 10. (PDF 486 kb)

Supplementary Table 5

M-CAP scores for disease-causing mutations found in BRCA1, BRCA2, CFTR and MLL2. (XLSX 43 kb)

Supplementary Table 7

Clinical phenotypes for case study patients. (XLSX 73 kb)

Supplementary Table 8

Rare missense variants in case study patients. (XLSX 150 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jagadeesh, K., Wenger, A., Berger, M. et al. M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nat Genet 48, 1581–1586 (2016). https://doi.org/10.1038/ng.3703

Download citation

Received: 30 June 2016
Accepted: 26 September 2016
Published: 24 October 2016
Issue Date: December 2016
DOI: https://doi.org/10.1038/ng.3703

This article is cited by

A novel missense COL9A3 variant in a pedigree with multiple lumbar disc herniation
- Lejian Jiang
- Chenhuan Wang
- Qingfeng Hu
Journal of Orthopaedic Surgery and Research (2024)
A loss-of-function variant in ZCWPW1 causes human male infertility with sperm head defect and high DNA fragmentation
- Yuelin Song
- Juncen Guo
- Hongjing Wang
Reproductive Health (2024)
MAGPIE: accurate pathogenic prediction for multiple variant types using machine learning approach
- Yicheng Liu
- Tianyun Zhang
- Ning Shen
Genome Medicine (2024)
Explicable prioritization of genetic variants by integration of rule-based and machine learning algorithms for diagnosis of rare Mendelian disorders
- Ho Heon Kim
- Dong-Wook Kim
- Kyoungyeul Lee
Human Genomics (2024)
Genetic architecture and biology of youth-onset type 2 diabetes
- Soo Heon Kwak
- Shylaja Srinivasan
- Jason Flannick
Nature Metabolism (2024)