Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Settling the score: variant prioritization and Mendelian disease

Key Points

  • Exome and genome sequencing reveal thousands to millions of genetic variants in a typical individual. A fundamental challenge in human genetics is isolating the small subset (typically one or two) of variants that cause a Mendelian disease phenotype. This Review describes the computational approaches used to prioritize variants in Mendelian disease.

  • A multitude of tools prioritize variants on the basis of biochemical, evolutionary, allele segregation and population frequency characteristics in an attempt to prioritize the list of potential causative variants. The strategies and caveats associated with these tools are outlined in this Review.

  • Burden tests take prioritization to the next level by aggregating the variants observed at a given locus to calculate a burden score for the gene. Most burden testing software tools also evaluate potentially damaging genotypes in the context of other genotypes observed at the same locus in a control population.

  • Variant interpretation is the process of drawing direct connections from individual variants to disease phenotypes, and this process is central to both clinical reporting of results and incidental findings, as well as research endeavours that include variant discovery and return of results.

  • Variant prioritization and interpretation are especially challenging for non-coding variants, structural variants and synonymous exonic variants. Furthermore, increasingly complex reference genomes introduce new demands for variant discovery tools. Each of these challenges drive increasingly sophisticated software solutions.

Abstract

When investigating Mendelian disease using exome or genome sequencing, distinguishing disease-causing genetic variants from the multitude of candidate variants is a complex, multidimensional task. Many prioritization tools and online interpretation resources exist, and professional organizations have offered clinical guidelines for review and return of prioritization results. In this Review, we describe the strengths and weaknesses of widely used computational approaches, explain their roles in the diagnostic and discovery process and discuss how they can inform (and misinform) expert reviewers. We place variant prioritization in the wider context of gene prioritization, burden testing and genotype–phenotype association, and we discuss opportunities and challenges introduced by whole-genome sequencing.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: A demonstration of the multiple possible effects of a single variant across transcripts and genes.
Figure 2: Population stratification and regional constraint within a gene are critical to variant interpretation.
Figure 3: Phenotypes are described across a spectrum of granularity, and different terminologies are used to define these features.

Similar content being viewed by others

References

  1. Bamshad, M. J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet. 12, 745–755 (2011).

    Article  CAS  PubMed  Google Scholar 

  2. Chong, J. X. et al. The genetic basis of Mendelian phenotypes: discoveries, challenges, and opportunities. Am. J. Hum. Genet. 97, 199–215 (2015). This review summarizes findings from the study of more than 8,000 families with Mendelian disease phenotypes by the Centers for Mendelian Genomics.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015). By sequencing the genomes of more than 2,500 individuals from diverse world ancestries, this study provides the first genome-wide map of both common and rare human genetic variation.

  4. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016). The ExAC-integrated exome sequencing data from 60,706 individuals provides an invaluable reference data set of genetic variation in protein-coding genes. Assessing variant allele frequencies in ExAC facilitates the interpretation of candidate variants observed in Mendelian disease families.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Cooper, G. M. & Shendure, J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat. Rev. Genet. 12, 628–640 (2011).

    Article  CAS  PubMed  Google Scholar 

  6. Kennedy, B. et al. Using VAAST to identify disease-associated variants in next-generation sequencing data. Curr. Protoc. Hum. Genet. 81, 6.14.1–6.14.25 (2014).

    Article  Google Scholar 

  7. Wu, M. C. et al. Powerful SNP-set analysis for case-control genome-wide association studies. Am. J. Hum. Genet. 86, 929–942 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Price, A. L. et al. Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 832–838 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Liu, D. J. & Leal, S. M. A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet. 6, e1001156 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Li, B. & Leal, S. M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet. 95, 5–23 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Coonrod, E. M., Margraf, R. L., Russell, A., Voelkerding, K. V. & Reese, M. G. Clinical analysis of genome next-generation sequencing data using the Omicia platform. Expert Rev. Mol. Diagn. 13, 529–540 (2013).

    Article  CAS  PubMed  Google Scholar 

  13. Doig, K. D. et al. PathOS: a decision support system for reporting high throughput sequencing of cancers in clinical diagnostic laboratories. Genome Med. 9, 38 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Miller, C. A., Qiao, Y., DiSera, T., D'Astous, B. & Marth, G. T. bam.iobio: a web-based, real-time, sequence alignment file inspector. Nat. Methods 11, 1189 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Vandeweyer, G., Van Laer, L., Loeys, B., Van den Bulcke, T. & Kooy, R. F. VariantDB: a flexible annotation and filtering portal for next generation sequencing data. Genome Med. 6, 74 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Amberger, J. S., Bocchini, C. A., Schiettecatte, F., Scott, A. F. & Hamosh, A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43, D789–D798 (2015).

    Article  CAS  PubMed  Google Scholar 

  17. Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016). ClinVar is an important repository for collating and understanding genome variant interpretation.

    Article  CAS  PubMed  Google Scholar 

  18. DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1–11.10.33 (2013).

    Google Scholar 

  21. Zook, J. M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).

    Article  CAS  PubMed  Google Scholar 

  22. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 122 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Yandell, M. et al. A probabilistic disease-gene finder for personal genomes. Genome Res. 21, 1529–1542 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Eilbeck, K. et al. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 6, R44 (2005). The Sequence Ontology is a project that initiated developing standardized terminologies for genomic sequence features and became widely used in both genome annotation and more recently in variant annotation. It is a key vocabulary used by tools that assign consequences to variants.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Cunningham, F., Moore, B., Ruiz-Schultz, N., Ritchie, G. R. & Eilbeck, K. Improving the Sequence Ontology terminology for genomic variant annotation. J. Biomed. Semantics 6, 32 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Aken, B. L. et al. Ensembl 2017. Nucleic Acids Res. 45, D635–D642 (2017).

    Article  CAS  PubMed  Google Scholar 

  30. Lappalainen, I. et al. DbVar and DGVa: public archives for genomic structural variation. Nucleic Acids Res. 41, D936–D941 (2013).

    Article  CAS  PubMed  Google Scholar 

  31. Eilbeck, K., Moore, B., Holt, C. & Yandell, M. Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics 10, 67 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Pertea, M. & Salzberg, S. L. Between a chicken and a grape: estimating the number of human genes. Genome Biol. 11, 206 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Ezkurdia, I. et al. Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes. Hum. Mol. Genet. 23, 5866–5878 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. MacArthur, D. G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012). Through careful examination of LOF variants in 185 individuals, this study predicted that a typical human harbours roughly 100 potential LOF variants in their genome, highlighting the challenge of isolating the one or two causal variants underlying a Mendelian disease phenotype.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Saleheen, D. et al. Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity. Nature 544, 235–239 (2017). This manuscript studies individuals harbouring homozygous LOF variants in a population with a high rate of consanguinity, revealing more than 1,000 genes that were predicted to be completely knocked out in at least one individual studied.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Sheikh, T. I., Mittal, K., Willis, M. J. & Vincent, J. B. A synonymous change, p. Gly16Gly in MECP2 Exon 1, causes a cryptic splice event in a Rett syndrome patient. Orphanet J. Rare Dis. 8, 108 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Nackley, A. G. et al. Human catechol-O-methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure. Science 314, 1930–1933 (2006).

    Article  CAS  PubMed  Google Scholar 

  38. Kimchi-Sarfaty, C. et al. A 'silent' polymorphism in the MDR1 gene changes substrate specificity. Science 315, 525–528 (2007).

    Article  CAS  PubMed  Google Scholar 

  39. Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014). This manuscript describes the Combined Annotation-Dependent Depletion (CADD) score, which integrates diverse genome annotations into a classifier to assess the relative deleteriousness of variants genome-wide.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Gulko, B., Hubisz, M. J., Gronau, I. & Siepel, A. A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat. Genet. 47, 276–283 (2015). By integrating high-throughput functional data from the ENCODE project, the fitCons method estimates the probability of whether any genome-wide point mutation will result in a fitness consequence.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Ng, P. C. & Henikoff, S. Predicting deleterious amino acid substitutions. Genome Res. 11, 863–874 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Yip, S. P. Sequence variation at the human ABO locus. Ann. Hum. Genet. 66, 1–27 (2002).

    Article  CAS  PubMed  Google Scholar 

  44. Kaiser, V. B. et al. Homozygous loss-of-function variants in European cosmopolitan and isolate populations. Hum. Mol. Genet. 24, 5464–5474 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  46. The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  47. Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015). This study provides the first genome-wide map of all common forms of structural variation from thousands of human genomes.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Kidd, J. M. et al. Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation. Am. J. Hum. Genet. 91, 660–671 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Gabriel, S. E., Brigman, K. N., Koller, B. H., Boucher, R. C. & Stutts, M. J. Cystic fibrosis heterozygote resistance to cholera toxin in the cystic fibrosis mouse model. Science 266, 107–109 (1994).

    Article  CAS  PubMed  Google Scholar 

  52. Hedrick, P. W. Population genetics of malaria resistance in humans. Heredity 107, 283–304 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Shah, N. et al. Identification of misclassified ClinVar variants using disease population prevalence. Preprint at bioRxiv http://dx.doi.org/10.1101/075416 (2016).

    Google Scholar 

  54. Minikel, E. V. & MacArthur, D. G. Publicly available data provide evidence against NR1H3 R415Q Causing multiple sclerosis. Neuron 92, 336–338 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013). The authors use genetic variation from 6,515 exomes in the NHLBI Exome Sequencing Project to develop the Residual Variation Intolerance Score (RVIS), which ranks genes by their intolerance to 'functional' (that is, missense or LOF) variation.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Samocha, K. E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Shyr, C. et al. FLAGS, frequently mutated genes in public exomes. BMC Med. Genomics 7, 64 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Herman, D. S. et al. Truncations of titin causing dilated cardiomyopathy. N. Engl. J. Med. 366, 619–628 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Nigro, V. & Savarese, M. Genetic basis of limb-girdle muscular dystrophies: the 2014 update. Acta Myol. 33, 1–12 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. Hackman, P. et al. Tibial muscular dystrophy is a titinopathy caused by mutations in TTN, the gene encoding the giant skeletal-muscle protein titin. Am. J. Hum. Genet. 71, 492–500 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Ang-Tiu, C. U. & Nicolas, M. E. O. Ichthyosis bullosa of Siemens. J. Dermatol. Case Rep. 6, 78–81 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Chamcheu, J. C. et al. Keratin gene mutations in disorders of human skin and its appendages. Arch. Biochem. Biophys. 508, 123–137 (2011).

    Article  CAS  PubMed  Google Scholar 

  63. Madsen, B. E. & Browning, S. R. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5, e1000384 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Auer, P. L. & Lettre, G. Rare variant association studies: considerations, challenges and opportunities. Genome Med. 7, 16 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Lee, S. et al. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 91, 224–237 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Hu, H. et al. VAAST 2.0: improved variant classification and disease-gene identification using a conservation-controlled amino acid substitution matrix. Genet. Epidemiol. 37, 622–634 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  67. Hu, H. et al. A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data. Nat. Biotechnol. 32, 663–669 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Ross, C. A. & Tabrizi, S. J. Huntington's disease: from molecular pathogenesis to clinical treatment. Lancet Neurol. 10, 83–98 (2011).

    Article  CAS  PubMed  Google Scholar 

  69. Paila, U., Chapman, B. A., Kirchner, R. & Quinlan, A. R. GEMINI: integrative exploration of genetic variation and genome annotations. PLoS Comput. Biol. 9, e1003153 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Wang, G. T., Peng, B. & Leal, S. M. Variant association tools for quality control and analysis of large-scale sequence and genotyping array data. Am. J. Hum. Genet. 94, 770–783 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Köhler, S. et al. The Human Phenotype Ontology in 2017. Nucleic Acids Res. 45, D865–D876 (2017). The Human Phenotype Ontology provides a systematic description of clinical features and is annotated to both genes and diseases, making it an invaluable resource for variant prioritization.

    Article  CAS  PubMed  Google Scholar 

  72. Girdea, M. et al. PhenoTips: patient phenotyping software for clinical and research use. Hum. Mutat. 34, 1057–1065 (2013).

    Article  PubMed  Google Scholar 

  73. Hamosh, A. et al. PhenoDB: a new web-based tool for the collection, storage, and analysis of phenotypic features. Hum. Mutat. 34, 566–571 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  74. Smedley, D. & Robinson, P. N. Phenotype-driven strategies for exome prioritization of human Mendelian disease genes. Genome Med. 7, 81 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Smedley, D. et al. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat. Protoc. 10, 2004–2015 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Javed, A., Agrawal, S. & Ng, P. C. Phen-Gen: combining phenotype and genotype to analyze rare disorders. Nat. Methods 11, 935–937 (2014).

    Article  CAS  PubMed  Google Scholar 

  77. Sifrim, A. et al. eXtasy: variant prioritization by genomic data fusion. Nat. Methods 10, 1083–1084 (2013).

    Article  CAS  PubMed  Google Scholar 

  78. Yang, H., Robinson, P. N. & Wang, K. Phenolyzer: phenotype-based prioritization of candidate genes for human diseases. Nat. Methods 12, 841–843 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. James, R. A. et al. A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics. Genome Med. 8, 13 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Singleton, M. V. et al. Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families. Am. J. Hum. Genet. 94, 599–610 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Robinson, P. N. et al. Improved exome prioritization of disease genes through cross-species phenotype comparison. Genome Res. 24, 340–348 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Brownstein, C. A. et al. An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge. Genome Biol. 15, R53 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Wallis, Y. et al. Practice guidelines for the evaluation of pathogenicity and the reporting of sequence variants in clinical molecular genetics. ACGS http://www.acgs.uk.com/media/774853/evaluation_and_reporting_of_sequence_variants_bpgs_june_2013_-_finalpdf.pdf (2013).

  84. Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015). This paper provides the methodology with which to use the various lines of evidence for consistent variant interpretation.

    Article  PubMed  PubMed Central  Google Scholar 

  85. Association for Clinical Genetic Science. Consensus statement on adoption of American College of Medical Genetics and Genomics (ACMG) guidelines for sequence variant classification and interpretation. ACGS http://www.acgs.uk.com/media/1032817/acgs_consensus_statement_on_adoption_of_acmg_guidelines__1_.pdf (2016).

  86. den Dunnen, J. T. et al. HGVS recommendations for the description of sequence variants: 2016 update. Hum. Mutat. 37, 564–569 (2016).

    Article  CAS  PubMed  Google Scholar 

  87. Gray, K. A., Yates, B., Seal, R. L., Wright, M. W. & Bruford, E. A. Genenames.org: the HGNC resources in 2015. Nucleic Acids Res. 43, D1079–D1085 (2015).

    Article  CAS  PubMed  Google Scholar 

  88. Rehm, H. L. et al. ClinGen — the Clinical Genome Resource. N. Engl. J. Med. 372, 2235–2242 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. MacArthur, D. G. et al. Guidelines for investigating causality of sequence variants in human disease. Nature 508, 469–476 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  91. Ponting, C. P. & Hardison, R. C. What fraction of the human genome is functional? Genome Res. 21, 1769–1776 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Smedley, D. et al. A whole-genome analysis framework for effective identification of pathogenic regulatory variants in Mendelian disease. Am. J. Hum. Genet. 99, 595–606 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Huang, Y.-F., Gulko, B. & Siepel, A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat. Genet. 49, 618–624 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Hunt, R. C., Simhadri, V. L., Iandoli, M., Sauna, Z. E. & Kimchi-Sarfaty, C. Exposing synonymous mutations. Trends Genet. 30, 308–321 (2014).

    Article  CAS  PubMed  Google Scholar 

  97. Willig, L. K. et al. Whole-genome sequencing for identification of Mendelian disorders in critically ill infants: a retrospective analysis of diagnostic and clinical findings. Lancet Respir. Med. 3, 377–387 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Wu, N. et al. TBX6 null variants and a common hypomorphic allele in congenital scoliosis. N. Engl. J. Med. 372, 341–350 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Wieczorek, D. et al. Compound heterozygosity of low-frequency promoter deletions and rare loss-of-function mutations in TXNL4A causes Burn–McKeown syndrome. Am. J. Hum. Genet. 95, 698–707 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Redin, C. et al. The genomic landscape of balanced cytogenetic abnormalities associated with human congenital anomalies. Nat. Genet. 49, 36–45 (2017).

    Article  CAS  PubMed  Google Scholar 

  101. Merker, J. et al. Long-read whole genome sequencing identifies causal structural variation in a Mendelian disease. Genet. Med. http://dx.doi.org/10.1038/gim.2017.86 (2017).

  102. Brandler, W. M. et al. Frequency and complexity of de novo structural mutation in autism. Am. J. Hum. Genet. 98, 667–679 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Church, D. M. et al. Extending reference assembly models. Genome Biol. 16, 13 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  104. Jäger, M. et al. Alternate-locus aware variant calling in whole genome sequencing. Genome Med. 8, 130 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  105. Harrison, S. M. et al. Using ClinVar as a resource to support variant interpretation. Curr. Protoc. Hum. Genet. 89, 8.16.1–8.16.23 (2016).

    Article  Google Scholar 

  106. Ackerman, J. P. et al. The promise and peril of precision medicine: phenotyping still matters most. Mayo Clin. Proc. 91, 1606–1616 (2016).

    Article  Google Scholar 

  107. Dorfman, R. et al. Do common in silico tools predict the clinical consequences of amino-acid substitutions in the CFTR gene? Clin. Genet 77, 464–473 (2010).

    Article  CAS  PubMed  Google Scholar 

  108. Global Alliance for Genomics and Health. GENOMICS. A federated ecosystem for sharing genomic, clinical data. Science 352, 1278–1280 (2016).

  109. Krawczak, M. et al. Human gene mutation database-a biomedical information and research resource. Hum. Mutat. 15, 45–51 (2000).

    Article  CAS  PubMed  Google Scholar 

  110. Samuels, M. E. & Rouleau, G. A. The case for locus-specific databases. Nat. Rev. Genet. 12, 378–379 (2011).

    Article  CAS  PubMed  Google Scholar 

  111. Rath, A. et al. Representation of rare diseases in health information systems: the Orphanet approach to serve a wide range of end users. Hum. Mutat. 33, 803–808 (2012).

    Article  PubMed  Google Scholar 

  112. Kumar, P., Henikoff, S. & Ng, P. C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009).

    Article  CAS  PubMed  Google Scholar 

  113. Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. Predicting functional effect of human missense mutations using PolyPhen-2. Curr. Protoc. Hum. Genet. 7.20.1–7.20.41 (2013).

  114. Shihab, H. A. et al. Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum. Mutat. 34, 57–65 (2013).

    Article  CAS  PubMed  Google Scholar 

  115. Mistry, J., Finn, R. D., Eddy, S. R., Bateman, A. & Punta, M. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 41, e121 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  116. Choi, Y. & Chan, A. P. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 31, 2745–2747 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Ioannidis, N. M. et al. REVEL: an Ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877–885 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. Schwarz, J. M., Cooper, D. N., Schuelke, M. & Seelow, D. MutationTaster2: mutation prediction for the deep-sequencing age. Nat. Methods 11, 361–362 (2014).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank J. Chong for insightful discussions about the challenges of rare disease research at the University of Washington Center for Mendelian Genomics Workshop. This Review was supported by US National Institute of Health awards to A.Q. (NIH R01HG006693, NIH U24CA209999), K.E (NIH U41HG006834 (subcontract), NIH U01HG007437 (subcontract), NIH R01HG008628) and M.Y. (NIH R01GM104390, NIH UM1HL128711, NIH U01HL131698 and NSF IOS-1561337).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Yandell.

Ethics declarations

Competing interests

A.Q. is a co-founder of Base2 Genomics, LLC. M.Y. is on the Scientific Advisory Board of Fabric Genomics.

Related links

PowerPoint slides

Glossary

Mendelian disorders

Diseases or conditions that result from mutation at a genomic locus and are inherited according to Mendel's laws.

Variant prioritization

The process of ranking the variants observed in an individual genome on the basis of factors such as the predicted consequence of each variant and the observed frequency in a population.

Population allele frequencies

The proportion of chromosomes within a population that carry a particular change at a given locus.

Gene prioritization

The process of associating a gene with a disease phenotype; this strategy is often used during variant prioritization.

Burden testing

A gene prioritization approach that scores, ranks and prioritizes genes based on genotypes rather than on single variants. The observed (or for some methods, the theoretical) distribution of burden scores within the wider population is often used to rank a proband's genotype score. Many burden tests can also incorporate adjunct information into their calculations such as phylogenetic conservation, mode of inheritance and variant frequency data. Unlike variant prioritization tools, burden tests require access to genotype data for their calculations.

Decision support frameworks

Interactive, dynamic tools to guide medical decision-making by displaying and integrating patient data.

Nonsense-mediated decay

(NMD). A conserved eukaryotic pathway, the role of which is to detect and eliminate the translation of mRNAs that have premature stop codons.

Variant of uncertain significance

(VUS). Also known as variant of unknown significance. The canonical definition of a VUS is a variant in a disease-associated gene, the specific effect of which is unknown or uncertain. More generally, VUS can also be applied to variants in genes that lack direct disease association but are plausible given the biological function of the resulting protein.

Controlled vocabularies

Sets of agreed upon terms and definitions.

Exome

Generally, the portion of the genome that is translated into proteins.

Population stratification

The difference in allele frequencies across subpopulations.

Balancing selection

Under balancing selection, multiple alleles exist in a population when natural selection favours heterozygous genotypes.

Disease prevalence

The number of cases of a disease that are present in a population at a given point in time.

Purifying selection

Under purifying selection, deleterious alleles are selectively removed from a population.

Functional variants

Variants that alter gene function or expression.

Probands

The proband is the initial person of study in a genetics investigation. In the case of a family trio, the proband is usually the affected child.

De novo variant

A spontaneous mutation in a proband that is missing from the parents.

Phase

For a single variant, phase involves the determination of the parental chromosome on which a variant allele exists. When a proband and both parents have been sequenced, this can be directly determined for 'informative sites' where the allele transmission is unambiguous (for example, the proband is heterozygous A/G, the father is homozygous A/A, and the mother heterozygous A/G; in this case the G allele was clearly transmitted from the mother). More generally, phasing refers to the assignment of alleles from multiple variant sites to parental haplotypes.

Population genotype frequency

The proportion of individuals with a particular genotype at a given locus.

Incidental findings

In whole-exome sequencing (WES) or whole-genome sequencing (WGS), pathogenic and likely pathogenic variants in genes that are not relevant to the initial reason for sequencing may be found and reported back to the patient. These variants may relate to rare disease, disease risk, pharmacogenetic response, and status relating to prenatal screening.

Return of results

The process of returning findings from a research study, or incidental findings from a genetic test, back to the participant or patient.

Compound heterozygous inheritance

The situation in which a proband receives a damaging but different allele in the same gene, from each parent. Both copies of the gene are affected.

Topologically associating domains

(TADs). TADs are genomic regions in which loci have a higher probability of physical interaction.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Eilbeck, K., Quinlan, A. & Yandell, M. Settling the score: variant prioritization and Mendelian disease. Nat Rev Genet 18, 599–612 (2017). https://doi.org/10.1038/nrg.2017.52

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg.2017.52

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing