Using data from the 100,000 Genomes Project to resolve conflicting interpretations of a recurrent TUBB2A mutation

Defects in tubulin beta 2A class IIa ( TUBB2A ) are associated with a range of complex cerebral cortex dysplasias. 1 Despite several studies reporting NM_001069.3:c.743C>T p.(Ala248Val) as a recurrent pathogenic mutation, 1 2 it is listed in ClinVar with conflicting interpretations. To resolve these inconsistencies, we scanned data from the 100,000 Genomes Project 3 (100KGP) and identified 58 individuals where p.(Ala248Val) had been called. Read alignment analysis suggested that the variant was genuine in 5/58 individuals, all of whom had a primary neurodevelopmental phenotype. In the remaining cases which spanned non-specific disease phenotypes, low allelic ratios (1%–19%) suggest recurrent mismapping artefacts. Alpha and beta tubulins form heterodimers that polymerise to form microtubules, dynamic components of the cytoskeleton that play an important role in cell division, migration and intracellular transport. Variants in several tubulin genes are associated with a variety of cortical brain malformation phenotypes, including lissen-cephaly, polymicrogyria, microlissencephaly and simplified gyration, collectively termed ‘tubul-inopathies’. 4 5 A recently described tubulinopathy


Incidence and mechanism underlying recurrent p.(Ala248Val) mutations
In this study we identified 5 cases with the likely genuine p.(Ala248Val) mutation, of which 4 were confirmed as having arisen de novo. This recurrent mutation was identified by searching through 78,195 study participants, of whom 6,570 had been recruited to the 100KGP with a neurodevelopmental disorder. All 5 mutation carriers were in the latter category -we did not identify any likely genuine p.(Ala248Val) mutations amongst the 71,625 participants recruited for a different reason. As well as giving an idea of the likely incidence of this mutation in cohorts of individuals with neurodevelopmental disorders, this bias (5/6,570 vs 0/71,625; P < 0.00001, Fisher's exact test) also helps strengthen the evidence supporting the pathogenicity of p.(Ala248Val). In contrast, only 4/53 cases with the likely artefactual variant call were from the neurodevelopmental subdomain and this distribution was consistent with the null hypothesis (4/6,570 vs 49/71,625; P > 0.05).
Deamination of methylated cytosine residues is the most common mutational mechanism and such variants are approaching saturation in large population genome datasets. 1 Although the p.(Ala248Val) mutation mutational process. We suspect that, as well as resulting in missmapping artefacts, the cismorphic base in TUBB2BP1 might explain why genuine p.(Ala248Val) variants in TUBB2A are recurrent, via gene conversion.
Gene-conversion events are increasingly being recognised as an important mutational mechanism. 2 Well known examples include recurrent mutations in SBDS, 3 GBA 4 5 and the c.757delG variant in SORD recently associated with hereditary neuropathy. 6 The likelihood of gene conversion events occurring is correlated with how far the region of sequence similarity is away from the target site 2 ; one study suggested that most conversion events occur where the duplication is <55kb away. 7 In the case of TUBB2A, TUBB2BP1 lies approximately 23kb away and so well within the range where conversion events become more common.

Detection of genuine mosaicism
It is well known that parent-child trio sequencing using NGS is an effective way of picking up genetic mosaicism. 8 However in this case, mosaicism of the p.(Ala248Val) variant would be very difficult to detect robustly given our recommendation to use an allelic ratio threshold >20% to remove artefactual variant calls.

Reasons why p.(Ala248Val) was missed previously
We note that p.(Ala248Val) was not reported by the clinical filtering pipeline used by the 100KGP likely due to the Platypus 0.8.1 variant caller annotating variant with either a MQ (4/5) or a badReads (1/5) warning flag. In contrast, in the single sample Starling small-variant call vcfs, p.(Ala248Val) was annotated with a PASS flag in 5/5. We also note that the variant is called by GATK and passes the recommended hard filters in 5/5. In Patient 4, the variant was missed by the DDD study's filtering pipeline because PolyPhen predicts p.(Ala248Val) to be benign and this rules the variant out for analysis of singleton datasets (pers. comm. Caroline Wright), even though this gene has a high missense constraint score (Z=5.26; gnomAD v2.1.1).

Searching for p.(Ala248Val) in TUBB2B in 100KGP
It is notable that the analogous variant p.(Ala248Val) in TUBB2B has also been described in the literature in patients with polymicrogyria, intellectual disability and epilepsy. 9 A different variant involving the same codon p.(Ala248Thr) has also been reported previously as a de novo variant in a 28.5 week old foetus with central polymicrogyria-like cortical dysplasia and mild vermis hypoplasia. 10 The p.(Ala248Val) TUBB2B variant is associated with similarly conflicting interpretations in ClinVar (www.ncbi.nlm.nih.gov/clinvar/variation/381699); currently 1 benign and 1 likely pathogenic. The variant is a filtered (low-confidence) variant in gnomAD v2.1.1 and is present with a slightly higher allele frequency (4,284/134,020) in the exomes compared to the genomes (441/20,198). In gnomAD v3.1 the variant is also filtered and the global allele frequency of 1,900/120,372 rises to 1,668/23,178 (7.2%) in African/African Americans. In theory, this high AD would meet the BA1 stand-alone criteria supporting a benign interpretation using the ACMG variant interpretation framework. 11 However the lack of homozygotes (>50 expected in African/African Americans in gnomAD v3.1 assuming Hardy-Weinburg equilibrium; none observed) and a review of read alignments available for gnomAD v3.1 showing strand bias (only a single -ve strand read supporting the variant, across 20 heterozygous genomes available at time of review) strongly suggested this "variant" to be artefactual in most cases.

Ragoussis V
TUBB2B lies further away from TUBB2BP1 than TUBB2A, so genuine gene conversion events might be expected to be less common. However TUBB2B and TUBB2BP1 are still within the <55kb range proposed by Ezawa et al where the majority of conversion events occur. 7 We therefore sought to replicate the results seen for TUBB2A and searched for the analogous position in TUBB2B, as described above. Of the 78,195 individuals from the 100KGP contained in the V2 aggregate vcf file, we identified 7 individuals apparently heterozygous for p.(Ala248Val). In 3/7 of these individuals, the p.(Ala248Val) variant appeared with allelic ratios of >20% and supported by multiple reads across both strands. In contrast, for the remaining 4 individuals, the variant was observed at lower allelic fractions (3.5-13.5%) and was almost exclusively supported by +ve strand reads ( Figure   S4A). The clustering pattern seen is strikingly similar to that presented for TUBB2A in Figure 2A. were recruited to the 100KGP as part of parent-child trios and so we cannot determine whether these variants may have occurred de novo. Sanger sequencing was used to validate the variant for one of the two cortical malformation cases, where DNA was available ( Figure S5).
Lastly, we note that TUBB2B is better established as a disease gene and the associated cortical phenotypes are generally more profound than those seen in TUBB2A patients. 10 12 The fact that all five TUBB2A positive cases were entered into the 100KGP with a diagnosis of intellectual disability whereas the two suspected TUBB2B cases were listed in the cortical malformation disease category is therefore consistent with the literature and highlights that for the latter condition, MRI imaging typically plays a more prominent role in the diagnostic workup.   allelic fraction, as shown in Figure S4. The universal reverse primer used by Cushion et al 13 was used together with the TUBB2B_F primer listed in Table S4. No evidence of the mutation was seen in the sample from TUBB2A Patient 2, confirming that the F primer results in specificity for the TUBB2B isoform.     13 The binding sites of all 4 TUBB2A primers are shown in Figure 1. In contrast to other studies which have used long-range PCR to increase specificity to a target gene of interest, 5 14 here we designed Modified_R to contain two 3' mismatches with TUBB2B (underlined bases) so as to increase specificity towards TUBB2A. In contrast, Modified_F was designed within the sequence similar to TUBB2BP1 to test whether this would replicate the artefactual p.(Ala248Val) result. *We note that the presence of a rare variant in TUBB2B (rs1054332; 0.48% in gnomAD v3.1) means that in a small fraction of cases there would be just a single mismatch.

Primer Name
Primer sequence