Article Text

Download PDFPDF

Original research
Assessing performance of pathogenicity predictors using clinically relevant variant datasets
  1. Adam C Gunning1,2,
  2. Verity Fryer2,
  3. James Fasham1,
  4. Andrew H Crosby1,
  5. Sian Ellard2,
  6. Emma L Baple1,
  7. Caroline F Wright1
  1. 1College of Medicine and Health, University of Exeter Medical School Institute of Biomedical and Clinical Science, Exeter, Devon, UK
  2. 2Exeter Genomics Laboratory, Royal Devon & Exeter NHS Foundation Trust, Exeter, UK
  1. Correspondence to Professor Caroline F Wright, College of Medicine and Health, University of Exeter Medical School Institute of Biomedical and Clinical Science, Exeter EX25DW, UK; Caroline.Wright{at}


Background Pathogenicity predictors are integral to genomic variant interpretation but, despite their widespread usage, an independent validation of performance using a clinically relevant dataset has not been undertaken.

Methods We derive two validation datasets: an ‘open’ dataset containing variants extracted from publicly available databases, similar to those commonly applied in previous benchmarking exercises, and a ‘clinically representative’ dataset containing variants identified through research/diagnostic exome and panel sequencing. Using these datasets, we evaluate the performance of three recent meta-predictors, REVEL, GAVIN and ClinPred, and compare their performance against two commonly used in silico tools, SIFT and PolyPhen-2.

Results Although the newer meta-predictors outperform the older tools, the performance of all pathogenicity predictors is substantially lower in the clinically representative dataset. Using our clinically relevant dataset, REVEL performed best with an area under the receiver operating characteristic curve of 0.82. Using a concordance-based approach based on a consensus of multiple tools reduces the performance due to both discordance between tools and false concordance where tools make common misclassification. Analysis of tool feature usage may give an insight into the tool performance and misclassification.

Conclusion Our results support the adoption of meta-predictors over traditional in silico tools, but do not support a consensus-based approach as in current practice.

  • genetics
  • genetic testing
  • genetic variation
  • genomics
  • human genetics

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:

View Full Text

Statistics from


  • Twitter @gunningac, @JamesFasham, @RDExeter, @EllardSian, @RDExeter, @carolinefwright

  • ACG and VF contributed equally.

  • Contributors SE and CFW conceived of and designed the study. AHC, SE, ELB and CFW provided the data. ACG, VF and JF performed the data analysis. The manuscript was written with input from all authors.

  • Funding This work was supported by the Wellcome Trust [WT200990/Z/16/Z] and [WT200990/A/16/Z].

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available in a public, open access repository. All data relevant to the study are included in the article or uploaded as online supplementary information. The clinical dataset (online supplemental table S1) is released under the CC-BY license.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.