Article Text

Original research
Improving the clinical interpretation of missense variants in X linked genes using structural analysis
  1. Shalaw Rassul Sallah1,2,
  2. Jamie M Ellingford1,2,
  3. Panagiotis I Sergouniotis2,
  4. Simon C Ramsden2,
  5. Nicholas Lench3,
  6. Simon C Lovell1,
  7. Graeme C Black1,2
  1. 1Division of Evolution and Genomic Sciences, The University of Manchester Faculty of Biology, Medicine and Health, Manchester, UK
  2. 2Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester Academic Health Sciences Centre, Manchester, UK
  3. 3Congenica Ltd, Biodata Innovation Centre, Wellcome Genome Campus, Hinxton, London, UK
  1. Correspondence to Professor Graeme C Black;{at}


Background Improving the clinical interpretation of missense variants can increase the diagnostic yield of genomic testing and lead to personalised management strategies. Currently, due to the imprecision of bioinformatic tools that aim to predict variant pathogenicity, their role in clinical guidelines remains limited. There is a clear need for more accurate prediction algorithms and this study aims to improve performance by harnessing structural biology insights. The focus of this work is missense variants in a subset of genes associated with X linked disorders.

Methods We have developed a protein-specific variant interpreter (ProSper) that combines genetic and protein structural data. This algorithm predicts missense variant pathogenicity by applying machine learning approaches to the sequence and structural characteristics of variants.

Results ProSper outperformed seven previously described tools, including meta-predictors, in correctly evaluating whether or not variants are pathogenic; this was the case for 11 of the 21 genes associated with X linked disorders that met the inclusion criteria for this study. We also determined gene-specific pathogenicity thresholds that improved the performance of VEST4, REVEL and ClinPred, the three best-performing tools out of the seven that were evaluated; this was the case in 11, 11 and 12 different genes, respectively.

Conclusion ProSper can form the basis of a molecule-specific prediction tool that can be implemented into diagnostic strategies. It can allow the accurate prioritisation of missense variants associated with X linked disorders, aiding precise and timely diagnosis. In addition, we demonstrate that gene-specific pathogenicity thresholds for a range of missense prioritisation tools can lead to an increase in prediction accuracy.

  • clinical decision-making
  • genetic variation
  • mutation
  • missense
  • structural homology
  • protein
  • point mutation

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:

Statistics from

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • SCL and GCB are joint senior authors.

  • Contributors All persons who meet authorship criteria are listed as authors, and all authors certify that they have participated sufficiently in the work to take public responsibility for the content, including participation in the concept, design, analysis, writing or revision of the manuscript. Conception and design of study: GCB, SCL and SRS. Acquisition of data: SRS and SCR. Data analysis and/or interpretation: SRS, GCB and SCL. Drafting the manuscript: SRS, GCB, SCL, PIS and JME. Revising the manuscript: SRS, GCB, SCL, PIS, JME and NL. Approval of the manuscript to be published: SRS, GCB, SCL, PIS, JME, SCR and NL.

  • Funding This work was supported by the Medical Research Council (ref: 1790437) and Congenica.

  • Competing interests NL is an employee of Congenica. All other authors declare that they have no conflict of interest.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Some of the data that support the findings of this study are available from gnomAD, a public open access repository, at Some of the data are available from HGMD at and restrictions apply to the availability of these data, which are used under licence for this study. The rest of the data from the Manchester Genomic Diagnostic Laboratory are not publicly available due to privacy or ethical restrictions, but are available on request from the corresponding author.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.