Article Text

Download PDFPDF

Original research
Data-driven modelling of mutational hotspots and in silico predictors in hypertrophic cardiomyopathy
  1. Adam Waring1,
  2. Andrew Harper1,
  3. Silvia Salatino1,
  4. Christopher Kramer2,
  5. Stefan Neubauer3,
  6. Kate Thomson4,
  7. Hugh Watkins1,3,
  8. Martin Farrall1,3
  1. 1Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
  2. 2Department of Medicine, University of Virginia, Charlottesville, Virginia, USA
  3. 3Radcliffe Department of Medicine, University of Oxford, Oxford, UK
  4. 4Oxford Medical Genetics Laboratories, Churchill Hospital, Oxford, UK
  1. Correspondence to Professor Martin Farrall, Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK; martin.farrall{at}


Background Although rare missense variants in Mendelian disease genes often cluster in specific regions of proteins, it is unclear how to consider this when evaluating the pathogenicity of a gene or variant. Here we introduce methods for gene association and variant interpretation that use this powerful signal.

Methods We present statistical methods to detect missense variant clustering (BIN-test) combined with burden information (ClusterBurden). We introduce a flexible generalised additive modelling (GAM) framework to identify mutational hotspots using burden and clustering information (hotspot model) and supplemented by in silico predictors (hotspot+ model). The methods were applied to synthetic data and a case–control dataset, comprising 5338 hypertrophic cardiomyopathy patients and 125 748 population reference samples over 34 putative cardiomyopathy genes.

Results In simulations, the BIN-test was almost twice as powerful as the Anderson-Darling or Kolmogorov-Smirnov tests; ClusterBurden was computationally faster and more powerful than alternative position-informed methods. For 6/8 sarcomeric genes with strong clustering, Clusterburden showed enhanced power over burden-alone, equivalent to increasing the sample size by 50%. Hotspot+ models that combine burden, clustering and in silico predictors outperform generic pathogenicity predictors and effectively integrate ACMG criteria PM1 and PP3 to yield strong or moderate evidence of pathogenicity for 31.8% of examined variants of uncertain significance.

Conclusion GAMs represent a unified statistical modelling framework to combine burden, clustering and functional information. Hotspot models can refine maps of regional burden and hotspot+ models can be powerful predictors of variant pathogenicity. The BIN-test is a fast powerful approach to detect missense variant clustering that when combined with burden information (ClusterBurden) may enhance disease-gene discovery.

  • cardiomyopathy
  • clinical genetics
  • genetics

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:

View Full Text

Statistics from


  • Twitter @Adam_Waring_

  • Contributors AW and MF led conception and design of work. AW contributed to data curation, conducted all analyses, developed statistical methods and associated software and wrote each draft of manuscript. MF supervised all analyses and interpretation of results. AH led the curation and quality control of the hypertrophic cardiomyopathy datasets. SS contributed to quality control of the data. CK and SN supported access to the HCMR dataset. KT and HW advised and guided the clinical aspects of the work. MF and KT reviewed and editing each draft of the manuscript

  • Funding Wellcome Trust doctoral studentship (203834/Z/16/Z) to AW, MRC doctoral studentship to AH, Welcome Trust core award (203141/Z/16/Z, MF and HW), the Oxford BHF Centre of Research Excellence (RE/13/1/30181, MF and HW), HW has received support from the National Institute for Health Research Oxford Biomedical Research Centre. CK, SN and HW received support from a National Heart, Lung, and Blood Institute (grant U01HL117006-01A1).

  • Disclaimer The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Ethics approval The research protocol was approved by the South Central - Oxford A Research Ethics Committee (REC reference: 14/SC/0190); written informed consent was obtained from all participants.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available on reasonable request. Due to the confidential nature of some of the research materials supporting this publication, not all of the data can be made accessible to other researchers. Please contact the corresponding author for more information.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.