Introduction New mutations for Huntington disease (HD) occur due to CAG repeat instability of intermediate alleles (IA). IAs have between 27 and 35 CAG repeats, a range just below the disease threshold of 36 repeats. While they usually do not confer the HD phenotype, IAs are prone to paternal germline CAG repeat instability. Consequently, they may expand into the HD range upon transmission to the next generation, producing a new mutation. Quantified risk estimates for IA repeat instability are extremely limited but needed to inform clinical practice.
Methods Using small-pool PCR of sperm DNA from Caucasian men, we examined the frequency and magnitude of CAG repeat instability across the entire range of intermediate CAG sizes. The CAG size-specific risk estimates generated are based on the largest sample size ever examined, including 30 IAs and 18 198 sperm.
Results Our findings demonstrate a significant risk of new mutations. While all intermediate CAG sizes demonstrated repeat expansion into the HD range, alleles with 34 and 35 CAG repeats were associated with the highest risk of a new mutation (2.4% and 21.0%, respectively). IAs with ≥33 CAG repeats showed a dramatic increase in the frequency of instability and a switch towards a preponderance of repeat expansions over contractions.
Conclusions These data provide novel insights into the origins of new mutations for HD. The CAG size-specific risk estimates inform clinical practice and provide accurate risk information for persons who receive an IA predictive test result.
- Huntington disease
- Intermediate alleles
- CAG repeat instability
- New mutations
- Genetic counselling
Statistics from Altmetric.com
Huntington disease (HD) is an autosomal dominant, neurodegenerative disorder caused by an expanded cytosine-adenine-guanine (CAG) repeat tract.1 New mutations for HD have been shown to occur due to CAG repeat expansion of intermediate alleles (IAs).2 IAs have between 27 and 35 CAG repeats, a range that falls just below the disease-threshold of 36 repeats, although disease alleles with 36–39 CAG display reduced penetrance.3 IAs have been identified at a high frequency in the general population—approximately 6% of persons with no known association with HD have an IA, suggesting that upwards of 1 in 17 persons may receive an IA predictive test result.4 ,5
Several factors influencing CAG repeat instability in HD have been identified. Intergenerational repeat instability primarily occurs through the male germline with all documented cases of new mutations occurring following paternal transmission.2 ,6 CAG size also impacts repeat instability with larger CAG repeat tracts being more unstable.7–9 Intermediate and HD alleles are also enriched for particular haplotypes compared to control alleles, which suggests that haplotype may identify alleles susceptible to repeat expansion.10 However, data on the precise likelihood of repeat instability, particularly expansion into the disease-associated range, are extremely scarce. Quantified risk estimates are limited to a handful of familial transmission and single-sperm studies, which used exceedingly small sample sizes and generated inconsistent findings - some studies demonstrated significant instability, whereas others showed only stable transmissions.4 ,8 ,11–13 These conflicting results may be a consequence of the small sample size or may be due to variability in any of the factors known to influence instability.
In the present study, we used small-pool PCR to establish paternal germline CAG size-specific risk estimates for IA repeat instability. Based on the examination of 68 alleles and 42 457 sperms, this is the largest and most comprehensive study to quantify the frequency and magnitude of repeat contraction and expansion across the control, intermediate and HD CAG size ranges. These data provide novel insights into the origins of new mutations and have significant implications for clinical practice.
Subjects and methods
Asymptomatic Caucasian men who received a normal, intermediate or mutation-positive predictive test result from a Canadian, Australian or Dutch medical genetics clinic were mailed a letter of invitation. After providing informed consent, donors were mailed a sample collection kit, which included a sterile DNA-free collection cup. Semen samples were collected at the donors’ home and shipped directly to the laboratory on ice. Ethical approval was received from all applicable university and hospital review boards.
Small-pool PCR (SP-PCR) was used to quantitatively assess the frequency and magnitude of paternal germline CAG repeat instability.14 Sperm cells were isolated from semen using differential lysis as previously described.15 ,16 Genomic sperm DNA was extracted using the QIAGEN DNeasy Blood and Tissue Kit according to the manufacturer's instructions. HindIII-digested sperm DNA was quantified by ultraviolet spectroscopy and serially diluted to 60 pg/μL working concentration immediately prior to the SP-PCR assay. 7.5 pg of digested sperm DNA were added to each SP-PCR reaction, amplifying an average of 2.2 haploid genomic equivalents per reaction. Approximately 10% of the reactions failed to amplify due to lack of input DNA.
A sensitive hemi-nested SP-PCR assay was optimised based on previously reported CAG sizing protocols.6 ,17 Briefly, the first round PCR was carried out in a 5 μL reaction volume containing the primers HD344F_HEX (5′-HEX-CCTTCGAGTCCCTCAAGTCCTTC-3′, 0.6 mM) and HD482R (5′-GGCTGAGGAAGCTGAGGAG-3′, 0.6 mM), using a custom buffer mix designed to assist amplification of the GC-rich repeat region (1× PCR Buffer (10 mM Tris-HCl, 50 mM KCl, 1.5 mM MgCl2, pH 8.3), 3.5% HiDi-Formamide, 15% glycerol) with 0.2 mM each dNTP and 0.25 U Roche GMP Grade Taq DNA Polymerase. 60 pg/μL of digested sperm DNA was included in each master mix preparation such that 7.5 pg was present in each 5 μL reaction when distributed over eight 96-well plates. PCR conditions consisted of an initial denaturation step of 3 min at 95°C, followed by 15 cycles of 95°C, 61°C and 72°C for 1 min each, with a terminal elongation step of 5 min at 72°C. PCR products from the ﬁrst round reaction were diluted 1/10 in DNAse-free, RNAse-free dH2O and 2 mL of each dilution used as template for a 25 μL second round reaction. The second round reaction mix was identical to that of the first round, except for hemi-nested modified primers HD344F_HEX (5′-HEX-CCTTCGAGTCCCTCAAGTCCTTC-3′, 0.6 mM) and HD450R_PT (5′-GTTTGGCGGCGGTGGCGGCTGTTG-3′, 0.6 mM) and 33 cycles were performed. All reactions were set up in a bleached laminar flow hood and used equipment and reagents specifically designated for SP-PCR. PCR products were analysed using GeneScan fragment analysis on the Applied Biosystems 3730xl platform, detected using GeneMapper V.4.0 software with GS 500 LIZ internal size standard, and sized relative to controls of known repeat sizes. Eight DNA-negative reactions were distributed across each 96-well plate and each batch of four plates contained six positive controls of known CAG size and two duplicate genotyping reactions of donor genomic sperm DNA.
The number of SP-PCR reactions that failed to produce a product was used to empirically calculate the average number of input haploid DNA molecules per reaction based on the Poisson distribution.14 ,16 For each allele, using the ratio of negative reactions to total number of reactions analysed, the average number of input molecules amplified in each reaction was determined. Based on the empiric calculation of the average number of input molecules per reaction, the total number of molecules examined was established. The origin of the variant allele was empirically determined based on the largest expansion observed from nine control (≤26 CAG) sperm samples with a normal genotype, including homozygous control genotypes, from the University of British Columbia HD Biobank, which were not included in the overall analysis. GeneScan chromatographs of each SP-PCR reaction were manually scored for the presence of progenitor and variant alleles. PCR amplification produced the characteristic stutter pattern, which consists of a large peak of high intensity, trailed by 2–4 peaks of lower intensity. If two or more alleles were present in a single reaction and differed in size by one CAG repeat, the peak of the smaller allele displayed the highest peak intensity and had a peak area that was 150% greater than the larger allele.18 A proportion of sperm samples and positive sizing controls were also sequenced to confirm accurate CAG sizing.
A p value <0.05 was considered statistically significant. Pearson's correlation was performed to determine the relationship between CAG size and repeat instability. The frequency of repeat instability was reported as the percentage of variant alleles that differed in repeat length from the respective progenitor allele size based on the total number of sperms examined per CAG size. 95% CIs are reported for each instability estimate. The magnitude of CAG repeat instability was quantified by the repeat length variation between the progenitor and variant sperm CAG sizes. Statistical software programs used included SPSS V.20.0 and Excel V.12.34 for Mac.
In sum, 34 semen samples were analysed—25 samples from asymptomatic Caucasian men with control, intermediate or HD genotypes and nine samples from the University of British Columbia HD Biobank—for a total of 68 alleles, which ranged in CAG size from 15 to 42 CAG, including 35 control (≤26 CAG), 30 intermediate (27–35 CAG), and three HD (≥36 CAG) alleles. In total 42 457 sperm cells were examined—22 446 control, 18 198 intermediate and 1813 HD. The number of alleles and sperms examined at each CAG size are reported in table 1.
Relationship between CAG size and repeat instability
There was a significant non-linear relationship between CAG size and repeat instability, where the frequency of instability increased with increasing CAG size, particularly around ∼33 CAG repeats (r=0.788, n=68, p<0.0001, figure 1A). While significant correlations between CAG size and the frequency of repeat contraction (r=0.830, n=68, p<0.001, figure 1B) and expansion (r=0.703, n=68, p<0.001, figure 1C) were also observed, a more marked increase in repeat expansion was present at ≥33 CAG.
Frequency of CAG repeat instability
Most control alleles (n=35) were stable, with only 2.2% (n=490/22 446) demonstrating CAG repeat instability (table 2). Over the control CAG size range, however, there was a 5.4-fold increase in the frequency of instability, with 15 CAG alleles (n=2) showing 1.0% (n=13/1309) instability and 25 CAG alleles (n=1) having 5.4% (n=42/788) instability. Among control alleles, repeat contractions (1.6%, n=363/22 446) were more frequent than expansions (0.6%, n=127/22 446). Contraction instability increased 6.8-fold over the control CAG size range, while expansion instability increased only 3.5-fold. Specifically, contraction instability ranged from 0.6% (n=8/1309) to 4.1% (n=32/788) for 15 and 25 CAG alleles, respectively, whereas expansions varied from 0.4% (n=5/1309) to 1.3% (n=10/788).
Only 0.5% (n=4/788) of the largest control allele examined (25 CAG, n=1) expanded into the intermediate CAG size range—no other control alleles expanded beyond the upper limits of the control CAG size range (table 3).
Collectively, 15.8% (n=2869/18 198) of IAs (n=30) were unstable (table 2). There was a 6.0-fold increase in repeat instability over the intermediate CAG size range, with 27 CAG alleles (n=5) demonstrating 5.5% (n=161/2907) instability and 35 CAG alleles (n=4) having 33.0% (n=756/2290) instability. While the frequency of contractions (8.0%, n=1457/18 198) and expansions (7.8%, n=1412/18 198) were equivalent for IAs overall, there was a 13.1-fold increase in expansions, compared to a 4.2-fold increase in contractions over the intermediate CAG size range. Specifically, the frequency of expansions varied from 1.6% (n=27/1695) to 21.0% (n=481/2290), whereas the frequency of contractions extended from 3.5% (n=103/2907) to 14.6% (n=563/3850).
The frequency of IA repeat contractions into the control CAG size range was 1.0% (n=184/18 198), whereas the frequency of contraction within the IA range was 7.0% (n=1273/18 198, table 4). Of IAs that contracted, 87.4% (n=1273/1457) remained within the intermediate CAG size range, whereas 12.6% (n=184/1457) contracted into the control range. Collectively, 3.4% (n=610/18 198) of IAs expanded into the HD range resulting in a new mutation (table 3). The frequency of expansions beyond the disease threshold ranged from 0.1% (n=4/2907) to 21.0% (n=481/2290) for 27 and 35 CAG alleles, respectively, which represents a 210-fold increase over the IA CAG size range. Of expansions that crossed the disease threshold, 92.6% (n=565/610) were within the reduced penetrance CAG size range compared to 7.4% (n=45/610) in the full penetrance range.
While only three HD alleles (39, 41, 42 CAG) were examined, they were exceedingly unstable, with an instability rate of 74.1% (n=1344/1813, table 2). Among HD alleles, the frequency of expansion (59.2%, n=1074/1813) was greater than contractions (14.9%, n=270/1813). A small proportion of HD alleles reverted to control (0.1%, n=2/1813) or intermediate (0.9%, n=16/1813) alleles (table 4). Given the CAG size of the HD alleles examined, all expansions were into/within the full penetrance CAG size range.
Magnitude of CAG repeat instability
Among control alleles, the magnitude of contractions was greater than expansions—the largest contraction observed was −10 CAG compared to a +3 CAG expansion (figure 2). While control alleles demonstrated both small (1–3 CAG repeats) repeat length contractions and expansions, the frequency of small contractions was greater than expansions. Approximately 1.3% (n=297/22 446) of control alleles contracted by 1–3 CAG repeats, whereas only 0.5% (n=127/22 446) expanded by 1–3 repeats (figure 3A). Control alleles also showed large (≥5 CAG) contractions (0.1%; n=22/22 446), but no expansions ≥5 CAG repeats were observed.
For IAs (n=30), the magnitude of repeat instability was greater for expansions compared to contractions—the largest repeat length variation observed for expansions was +20 CAG compared to a −13 CAG contraction (figure 2). While variants with CAG sizes in the reduced penetrance HD range were observed at every intermediate progenitor CAG size, the first full penetrance allele did not occur until 30 CAG. IAs demonstrated a similar frequency of both small and large contractions and expansions. Approximately 7.4% (n=1329/18 198) of IAs expanded and 7.3% (n=1322/18 198) contracted by 1–3 CAG repeats (figure 3B). Similarly, 0.3% (n=62/18 198) of IAs expanded and 0.3% (n=57/18 198) contracted by ≥5 CAG repeats. Notably, 0.01% (n=2/18 198) of IAs demonstrated very large expansions (16–20 CAG repeats); however, contractions of this extreme magnitude were not observed.
Among HD alleles (n=3), the largest repeat length variation observed was +16 CAG for expansions and −12 CAG for contractions (figure 2). While the magnitude of instability was similar to that observed for IAs, HD alleles demonstrated a much greater frequency at each repeat length variation. Approximately 42.2% (n=765/1813) of alleles demonstrated small repeat expansions and 7.9% (n=143/1813) large expansions (figure 3C). Conversely, 13.3% (n=242/1813) of HD alleles contracted by 1–3 CAG repeats and 0.6% (n=11/1813) contracted by ≥5 CAGs. HD alleles also demonstrated very large contractions (0.1%, n=2/1813) and expansions (0.06%, n=1/1813).
This is the first study to quantify the frequency and magnitude of paternal germline CAG repeat instability across the entire intermediate CAG size range and represents the largest sample size ever examined. Based on 30 IAs and 18 198 sperms, our findings indicate there is a significant risk for IAs to expand into new mutations. While all repeat sizes in the intermediate CAG size range were shown to expand into the disease-associated range, the frequency of new mutations dramatically increased with increasing CAG size, which underscores the importance of using CAG size-specific risk estimates in clinical practice. Alleles at the upper limit of the intermediate CAG size range had the highest risk of new mutations, with approximately 21% of 35 CAG alleles expanding into the HD range. The majority of new mutations were within the reduced penetrance HD CAG size range and full penetrance mutations were not observed until 30 CAG. Given the scarcity of data that quantify repeat instability, these CAG-size specific instability rates will help inform more accurate risk assessment and genetic counselling for persons who receive an IA predictive test result.
Our findings significantly contribute to knowledge on the nature of intergenerational repeat instability in HD. Germline repeat instability was observed at every CAG size examined, including control, intermediate, and HD alleles. A significant (p<0.001) non-linear relationship was observed between CAG size and the frequency of repeat instability, with a dramatic increase in the rate of instability occurring at approximately 33 CAG repeats. While the overall frequency of instability was very low for control alleles, instability nevertheless increased with increasing CAG size. In fact, the frequency of instability increased nearly 5.4-fold over the range of control CAG size examined (15–25 CAG). Control alleles also demonstrated a strong tendency to contract in CAG size, with the frequency and magnitude of repeat contractions significantly exceeding expansions. Conversely, while only three HD alleles were examined they were highly unstable with instability biased towards expansion. Within the intermediate CAG size range, the frequency of instability increased with increasing CAG size. The frequency of IA contractions outweighed expansions until the upper limits of the intermediate CAG size range, around approximately 33 CAG repeats. Collectively, these findings suggest that there may be a threshold CAG length for instability of approximately 33 CAG repeats from which there is an increase in the frequency of instability and a switch towards an expansion bias. Consequently, alleles with a CAG length greater than this instability threshold have the most significant risk of expanding into the HD CAG size range.
The magnitude of repeat instability also showed a CAG length-dependent increase, where the frequency of small (1–3 CAG repeats) and large (≥5 CAG repeats) repeat length variations increased with increasing CAG size. Control alleles predominately underwent small repeat length changes with a bias towards contraction. IAs displayed a relatively equal frequency of small repeat expansions and contractions, but also underwent large expansions albeit at a lower rate. HD alleles demonstrated the highest frequency of large repeat expansions, although small repeat variations were still the most frequent. The magnitude of repeat instability observed across control, intermediate and HD alleles is consistent with a stepwise model of expansion, whereby alleles in the control range undergo successive small expansion events over time into the intermediate range and then beyond the disease threshold.10
The precise molecular mechanism underlying germline CAG repeat instability in HD remains elusive, but the data generated in this study provide some important insights. While a variety of mechanisms have been proposed, the formation of secondary structures during DNA replication or repair is thought to be a critical step.19–22 It is possible that there is an increased tendency for single-strand DNA to form stable secondary structures at the threshold length for instability. More specifically, alleles with ≥33 CAG repeats may be more prone to form stable hairpin loops, which can generate either expansions or contractions if they are not repaired prior to DNA replication. The strong paternal bias of instability suggests that the process of spermatogenesis may also play a role in the mechanism of instability.7 ,8 It is possible that an increased number of cell divisions in male, compared to female, gametogenesis enhances the likelihood of instability. Further, the occurrence of small and large repeat changes may indicate that more than one molecular mechanisms is involved. Small magnitudes of instability are commonly thought to occur as a result of DNA mispairing during premeiotic replication in spermatogenesis, whereas large contractions and expansions may result from deficient Okazaki fragment processing during the meiotic stage of spermatogenesis or may involve DNA repair mechanisms.23–26
While this is the largest study to date to quantify the frequency and magnitude of IA CAG repeat instability in HD, it is not without limitations. The first inherent limitation is that we cannot definitively prove the origin of variant alleles (ie, whether a variant allele originates from the lower or upper progenitor allele). While this is a common caveat for studies that examine microsatellite instability, the accuracy in the current study is increased due to the use of empiric expansion thresholds for control CAG sizes.27 While concerns have been raised about the possibility of artefacts being mistaken for variant alleles, methodology studies indicated artefacts are uncommon in SP-PCR.27 ,28 In the present study, artefacts were rare given that all alleles scored were discrete peaks with similar intensities that conformed to the expected stutter pattern; the distribution pattern of variant alleles differs from sample to sample; alleles were never more frequent than expected based on the amount of input DNA; and all negative control reactions were clean. This study also assumes that the percentage of variant sperms directly and accurately reflects the chance of inheriting that variant allele, but it is possible that the relative instability risks observed in sperms will not mirror the true clinical risk. In other words, we do not know whether CAG size influences the conception rates of sperms. We must also acknowledge that the instability estimates for alleles at the upper threshold of the control CAG size range (ie, 24–26 CAG), as well as the alleles in the HD CAG size range is limited by small sample size and therefore may not reliably reflect instability of these CAG sizes. Lastly, it is important to note that the IA CAG size range may change over time as our knowledge on which CAG sizes are prone to expansion into the disease range (lower threshold) and which CAG sizes are definitely associated with the HD phenotype (upper threshold) continues to evolve.
The results of this study have important clinical implications. This study confirms every CAG size in the intermediate size range is susceptible to paternal germline repeat expansion into the disease-associated range. Therefore, all individuals who have a CAG size between 27–35 CAG may benefit from information on the risk of repeat expansion leading to new mutation in offspring or future generations. Clinical risk assessment for repeat expansion should also account for sex of the transmitting parent and CAG size. Unfortunately, risk estimates for repeat instability during maternal transmission remain limited to previously published familial transmission studies and additional studies aimed at establishing empirical female-specific risk estimates are needed.12 Men found to have an IA predictive test result may be provided CAG size-specific risk estimates based on the data presented here and the magnitude of expansion could also be discussed. However, they should be cautioned about the relative nature of these instability estimates and the possibility of unknown genetic or environmental factors modifiers of instability.
In conclusion, this study increases our knowledge on the origins of new mutations for HD and is useful in clinical practice. The paternal germline CAG size-specific risk estimates for IA repeat instability increase the accuracy of clinical risk assessment for new mutations and provide data to help individuals make informed reproductive decisions.
We express our sincere gratitude to the sperm donors and HD families who agreed to become part the Huntington Disease Biobank at the University of British Columbia. We thank Michael Hockertz for IT support and Dr Martine van Belzen for providing demographic and CAG size data.
Contributors AS obtained ethical approvals; recruited the sperm donors; managed sample collection and shipment; performed the data analysis; wrote the manuscript; and generated the tables and figures. CK and CD established and optimised the SP-PCR and critically reviewed the manuscript. CK also performed all the SP-PCR and sequencing reactions. JAC, EKB and FR assisted with ethical approval, donor recruitment, sample shipment and reviewed the manuscript. PG provided critical guidance on the data analysis and manuscript. MRH is the senior author and oversaw all aspects of the research.
Funding This work was supported by the Canadian Institutes of Health Research (CIHR). AS was funded by a Doctoral Award from the CIHR and a Senior Trainee Award from the Michael Smith Foundation for Health Research. MRH is a Killam University Professor and holds a Canada Research Chair in Human Genetics and Molecular Medicine.
Competing interests None.
Ethics approval Clinical Research Ethic Board at the University of British Columbia, Women & Children's Hospital Research Ethics Board.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.