Background Classical randomisation of clinical trial patients creates a source of genetic variance that may be contributing to the high failure rate seen in neurodegenerative disease trials. Our objective was to quantify genetic difference between randomised trial arms and determine how imbalance can affect trial outcomes.
Methods 5851 patients with Parkinson’s disease of European ancestry data and two simulated virtual cohorts based on public data were used. Data were resampled at different sizes for 1000 iterations and randomly assigned to the two arms of a simulated trial. False-negative and false-positive rates were estimated using simulated clinical trials, and per cent difference in genetic risk score (GRS) and allele frequency was calculated to quantify variance between arms.
Results 5851 patients with Parkinson’s disease (mean (SD) age, 61.02 (12.61) years; 2095 women (35.81%)) as well as simulated patients from virtually created cohorts were used in the study. Approximately 90% of the iterations had at least one statistically significant difference in individual risk SNPs between each trial arm. Approximately 5%–6% of iterations had a statistically significant difference between trial arms in mean GRS. For significant iterations, the average per cent difference for mean GRS between trial arms was 130.87%, 95% CI 120.89 to 140.85 (n=200). Glucocerebrocidase (GBA) gene-only simulations see an average 18.86%, 95% CI 18.01 to 19.71 difference in GRS scores between trial arms (n=50). When adding a drug effect of −0.5 points in MDS-UPDRS per year at n=50, 33.9% of trials resulted in false negatives.
Conclusions Our data support the hypothesis that within genetically unmatched clinical trials, genetic heterogeneity could confound true therapeutic effects as expected. Clinical trials should undergo pretrial genetic adjustment or, at the minimum, post-trial adjustment and analysis for failed trials.
- parkinson-s disease
- clinical genetics
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
In the past few decades, clinical trials for neurodegenerative disease-modifying drugs have repeatedly failed. Between the years 2002 and 2012, 413 Alzheimer’s disease (AD) trials were performed, with 99.6% resulting in failure.1 Eighty-three of these trials were in phase III, which can cost an estimated US$11.5 to US$52.9 million.2 Success has also been elusive for Parkinson’s disease (PD), where drugs such as preladenant, which show potency in phase II, often fail to be successful in phase III.3 Failures at this stage of clinical trials can be attributed to numerous reasons, but one reason for failure may be attributable to genetic risk variability and non-optimal randomisation of patient trial arms that can create large sources of variation in genetic risk factors across trial arms.
For PD, motor or cognitive symptoms serve as measurable outcomes in clinical trials. The Unified Parkinson’s Disease Rating Scale (UPDRS) and the Movement Disorder Society’s updated revision of this test (MDS-UPDRS) are used to assess the severity of PD symptoms. A combination of the UPDRS parts II (Activities of Daily Living) and III (Motor Examination) is often used as an endpoint in PD clinical trials. However, genetic heterogeneity can cause variance in terms of the progression and presentation of PD symptoms, potentially affecting overall MDS-UPDRS readings, and thus, trial outcomes. It was shown that PD genetic risk score (GRS—a score composed of the combined effects of common variants that are associated with a disease in genome-wide association studies (GWAS))4 can affect time to progression to Hoehn and Yahr (H&Y) stage 3.5 Furthermore, studies on common genetic variants that are part of the GRS, such as variants in GBA, MAPT and SNCA were suggested to affect progression of motor and/or cognitive symptoms.6 7 For example, the p.E326K variant in GBA, a component of GRS which is relatively common in PD and can be found in more than 5% of patients with PD,8 is also associated with motor and cognitive progression in PD.6 A study investigating predictors of motor progression in patients with PD found that an interaction between two SNPs, rs9298897 and rs17710829, resulted in a ~2-point increase in MDS-UPDRS score per year, indicative of a faster rate of motor decline in those patients.9
To say that all PD clinical trials do not undergo some extent of pretrial genetic adjustment would be incorrect, as there are clinical trials underway specifically for patients with PD who carry a GBA or LRRK2 mutation.10 11 However, even within these specific subgroups of PD mutations, large variation between patients still exists. GBA is a prime example, as different mutations within the GBA gene lead to differential effects on PD phenotypes.12 13 Carriers of severe GBA mutations have an age at onset (AAO) roughly 5 years earlier and around a threefold to fourfold increase in disease risk, compared with mild GBA mutation carriers.14 Another example of this is seen among LRRK2 mutations, with different variant possessing different molecular mechanisms and cellular effects. LRRK2 G2019S, for example, is involved with kinase activation and lysosomal positioning alteration, where LRRK2 R1441C is linked to guanosine triphosphate hydrolysis disruption and has no known effects on lysosomal positioning.15 GRS for PD have also been connected to disparity in disease aetiology, with an increase in risk score corresponding to a decrease in AAO.16 Other studies suggest that a single SD increase in GRS may speed onset to almost 1 year earlier.17 The relationship between AAO and PD symptoms are well described in many studies, finding that variance in AAO leads to variance in mortality as well as variations in presentation of motor and non-motor phenotypes.18 19 With a considerable amount of variance in PD disease aetiology resulting from variance in genetic architecture, we hypothesise that classical randomisation in clinical trials is creating genetic imbalance that may be affecting trial outcomes.
GRS and variant nomination from GWAS
A GRS for each patient was calculated from the cumulative effect of each of the 47 variants nominated by GWAS.20 The regression coefficients, which represent the effect size of each allele of these variants, were used to calculate the GRS for each individual in the study. The formula as used and explained in Chang et al, (2017)20, is below:
In the above formula, k represents the total number of variants, βi is the regression coefficient associated with the effect allele from the GWAS and SNPi is the variant. This formula is applied to all patients with PD and controls in the dataframe, which results in a GRS for each person, which are then scaled to Z-scores (SD of risk) weighted by the controls. Imputation of genetic data was done via the Michigan Imputation Server with the Haplotype Reference Consortium (HRC) reference panel, allowing no more than 5% genotype missingness per sample.
Single-variant and GRS analysis
To simulate randomisation in a clinical trial, data from 5851 patients with PD were sampled for different trial sizes of 200, 400, 600, 1000, 2000, 3000, 4000 and 5000 participants. For each trial size, 1000 iterations were performed, and patients were randomly assigned to treatment and placebo arms of the trial. Genetic data were acquired from the International Parkinson’s Disease Genomics Consortium (IPDGC) NeuroX dataset which consists of unrelated PD cases and controls of European ancestry. The NeuroX array comprised the standard Illumina exome content with an additional 24 000 custom variants relating to neurological diseases costing roughly US$50–US$60 per sample to genotype. This cohort can be obtained from dbGaP21 with study accession phs000918.v1.p1 and is described in detail in previous studies.22 23 Please refer the online supplementary material for additional information such as individual study/collection sites and demographics for this cohort (online supplementary table 1).
We then aimed to determine whether any of the 47 variants were not equally distributed between the arms of the trial, in each of the iterations in each cohort size, as such unequal distribution can potentially affect trial outcomes. Statistical significance for each variant was determined by its ability to classify a patient to the correct class of treatment or placebo group, achieved through the use of logistic regression. If a variant was determined to be significant, then imbalance between arms for that variant was large enough to be useful to the logistic regression model. Cohorts were not stratified into early-onset and late-onset groups for these analyses, as while GRS has been found to be significantly associated with AAO, it has not been found to be significant within AAO-stratified cohorts.16 This is likely due to a lack of power when sample size is decreased, so for the purposes of this manuscript we focused on quantifying variance using a typical clinical trial design.
In addition to analysis of cumulative allele frequencies at specific SNPs of interest, we also investigated the difference in mean GRS between trial arms. All 1000 iterations were then filtered by the statistical significance of their Z-score, either falling above or below the 95% significance cut-off values of ±1.96. Per cent difference was calculated to quantify the difference in GRS between arms. A basic visualisation of the workflow for GRS and single-variant analyses is provided for further clarity (figure 1).
Difference in variance in GRS between arms was also investigated, as genetic variance between the two arms of randomly assigned patients will not always be equal. Trial arms were compared by performing a Levene’s test for equality of variances at each iteration. Additional description of methods for this section of analysis focusing on within-group variance estimates can be found in the online supplementary material.
After initial variance analysis of the randomised cohorts, an algorithm was used to balance patients between trial arms by genotype using the 47 variants nominated by GWAS. Additional description of the above methods can be found in the online supplementary material.
Virtual cohort simulation: rs9298897 and rs17710829
Next, we set to examine whether a combination of two variants (rs9298897 and rs17710829) can potentially affect trial outcomes, if the distribution of these variants is not similar in both trial arms. These variants were previously shown to affect PD progression by a 1–2 point increase in MDS-UPDRS (parts II and III) per year in carriers of both variants.9 Since the data from our NeuroX cohort of real patients did not have enough instances of this interaction due to low frequency of rs17710829, a virtual cohort was created with the statistical software R for this analysis. To create a virtual cohort of carriers, these variants were assigned to a simulated population of 5000 individuals according to Hardy-Weinberg equilibrium and European allele frequency estimates as reported by the ExAC Release 1 database.23 24 The estimates from this database are slightly lower than what would be seen in PD cases and are therefore more conservative. Each virtual patient was represented by a generated ID and their assigned genotypes for each of these two variants as determined by their known frequencies. Virtual patients are represented in the manner same in our databases as the real patients, except variants are assigned due to frequency estimates rather than taken from genotyping information. A small example of what the virtual patient cohort looks like is included in the online supplementary material (online supplementary table 2).
Changes in combined part II and III MDS-UPDRS scores from baseline were used to determine differences between arms. The MDS-UPDRS was revised from the original UPDRS to improve certain metrics that were not being satisfactorily captured.25 As change in MDS-UPDRS was the chosen metric, the initial score for each patient would not affect simulation results; however, a range of baseline scores was chosen to mimic conditions in real trials. Virtual patients were randomly assigned a baseline MDS-UPDRS score on a range from 15 to 25, such that scores within that range followed a uniform distribution. For each of the two simulated years, all virtual patients were assigned a random progression in MDS-UPDRS score of either 1 or 2 points per year, a more conservative progression rate based on the average increase in MDS-UPDRS scores found in the Holden et al study,26 for simulation purposes. We chose the more conservative 1–2 point increase per year to focus on illuminating the potential effects of a genetic imbalance, without other sources of variance confounding results, such as a large range in typical progression. We chose to limit other sources of variance to focus on the effect the genetic variants would have. As large variance can detrimentally affect test significance, we wanted to create a hypothetical situation where variance in typical progression was controlled. This was to highlight that even in a perfect scenario with no additional confounding effects that significantly affect progression rate, a genetic imbalance between arms is enough to cause false-negative and false-positive trials. Carriers of both the rs9298897 and rs17710829 variant received an additional increase in MDS-UPDRS score in accordance with the model effect size reported in Latourelle et al (β=2.374, SE=0.436). This cohort was sampled at sizes of 50, 100, 200, 300, 600, 700 and 800 observations and patients were randomly assigned to simulated treatment or placebo arm. Both false-positive (ie, simulated drug is not effective, yet trial results are positive due to imbalanced distribution of the two SNPs) and false-negative (simulated drug is effective, yet trial results are negative due to imbalanced distribution of the two SNPs) rates were investigated in this stage by performing two sample Z-tests for each iteration. Percentage of false positives caused by the addition of the SNP interaction effect was calculated by comparing the results of tests with and without the effect. For false-negative rates, a simulated ‘drug’ effect that decreased MDS-UPDRS score by 0.5 points per year was added to the patients in the treatment arm.
Virtual cohort simulation: GBA
Similarly to the interaction cohort above, a virtual cohort for GBA mutation carriers was generated for simulation use. Many of the variants associated with this gene have low allele frequencies, resulting in only a small amount of real data. As with the interaction cohort, a virtual cohort was created to counteract this limitation. Three of the 47 variants used in this study are GBA variants, and these same variants were used to create a virtual cohort of patients, representing one of the many ongoing or upcoming GBA-focused interventional trials. Using effect estimates from this study, individual genetic risk contribution was assigned to each variant. The variants were p.N370S (rs76763715) (β=0.747, 95% CI 0.60 to 0.90), p.E326K (rs2230288) (β=0.636, 95% CI 0.55 to 0.72) and p.T369M (rs75548401) (β=0.362, 95% CI 0.23 to 0.50), all three of which have been associated with the risk for PD.27–30 Each variant was then assigned to a population of ~60 000 simulated individuals according to Hardy-Weinberg equilibrium and European allele frequency estimates as reported by the ExAC Release 1 database.24 GRS for each virtual patient was calculated using the same formula as before, but using only the three chosen GBA variants. Each virtual patient was represented by an arbitrary ID, assigned genotypes for the three GBA variants and individual GRS. Patients were filtered for those who possessed at least one of the chosen mutations and then sampled at sizes of 50, 100, 200, 300, 600, 700, 800, 1000 and 2000 observations to simulate GBA- targeted trials. Raw GRS scores were used for the analysis of this cohort rather than control-weighted Z-scores as with the larger cohort. Average per cent difference in GRS between trial arms was calculated for each sample size.
This study was institutional review board approved and all patients gave written informed consent, protocol number 2003-077.
All statistical and modelling analyses were conducted with R.31 Code is available to the public through the National Institute on Aging Laboratory of Neurogenetics Github at https://github.com/neurogenetics/Clinical-Trial-Outcomes. Additional information is in the online supplementary material.
High genetic heterogeneity with randomisation of different simulated trial sizes
To examine how randomisation of patients at different sample sizes affects variability in overall GRS difference between arms, we performed 1000 iterations of sampling and randomisation of trial arms for different sample sizes. Evaluation of GRS differences across trial arms revealed that overall average per cent difference between trial arms was high at small sample sizes, and the magnitude of difference decreased as sample size increased (figure 2A). Results from analysis of differences in variance in GRS between trial arms were similar to differences in mean GRS, with a high per cent difference between arms that decreased as sample size increased. Additional results and tables from this analysis can be found in the online supplementary material (online supplementary table 3). Statistically significant iterations in either variance or mean GRS difference between the simulated trial arms accounted for roughly 10% of iterations at all sample sizes. At smaller sample sizes, per cent difference in GRS score can be over 100% when comparing trial arms, but this difference decreased to roughly half that amount as the sample size reached 1000 patients (table 1). Classic randomisation will create large differences in GRS between trial arms; therefore, a more conscious method of trial randomisation that accounts for patient genotype imbalance needs to be incorporated into trial design.
Variation in single-variant distribution between randomised arms
To investigate imbalance between variants we determined where the frequency of each of the 47 variants was significantly different between randomised arms. This revealed that approximately 90% of the trials, regardless of trial size, resulted in a significant difference in allele frequency of the risk SNPs between treatment and placebo arms. We found that number of significant variants (unadjusted p<0.05) fluctuated with sample size. This suggests that it is unlikely that simply increasing sample size will result in a reduction in the number of significantly differently distributed variants between arms. This is a function of allele frequency and statistical power as sample size increases. In addition, there was a non-significant correlation between sample size and number of significant variants (r=0.686, p=0.061), which suggests that increasing sample size may result in an overall increase in the number of significantly different variant distributions (figure 2C). However, while number of significant variants may increase, the per cent difference in allele frequency of the SNPs of interest between arms decreases. For significant iterations, average per cent difference in cumulative risk allele dosage decreases from 41.60% to 27.60%, a drop in difference of 14% (p=5.76e-66, 95% CI 12.42 to 15.57) as sample size increases from 200 to 1000 (table 2). True per cent difference between arms is likely higher than stated here, as situations where either one of the trial arms possessed zero counts of a rare variant could not be included in per cent difference calculations. As such a high number of simulated trial iterations resulted in significantly different variant frequencies, genotype needs to be taken into account when designing trials, to prevent an imbalance in one of these variants affecting disease progression or read-out and altering the outcome of a trial.
False-negative and false-positive rates
An interaction effect between two variants associated with an increase in MDS-UPDRS was used to demonstrate the effects of imbalanced trials on overall outcome. We found that at small sample sizes, 33.9% of trials resulted in a false negative with the simulated drug effect of a 0.5-point reduction in MDS-UPDRS score per year at n=50. False-negative rate decreased as sample size increased, reaching nearly 1.0% as sample size approached 200. With the addition of the second year of the trial, percentage of false negatives decreased across sample sizes; however, 21.2% of iterations still resulted in a false negative at n=50. The number of iterations that resulted in false positives and false negatives was compared both with and without the interaction effect. This allowed us to determine how many false positives and negatives were truly attributable to in imbalance of these SNPs between arms. Percentage of false negatives caused by the SNP interaction effect alone increased with sample size, with 100% of the observed false negatives being caused by the interaction as sample size increased to 200 for trial year 1. Nearly 100% of false negatives in trial year 2 was caused by the SNP interaction effect alone (table 3). False positives and the percentage of false positives caused by the SNP interaction effect were fairly similar across sample sizes (table 4). Failing to balance trial arms using genetic information can result in an imbalance of variants that can affect disease progression and outcome, ultimately resulting in changing the result of a clinical trial.
Effects of balancing on variance per simulated trial sample size
Balancing trial arms according to allele count significantly reduced genetic variance between arms, and results for this can be found in the online supplementary material (online supplementary table 4, online supplementary figure 3).
Genetic heterogeneity in randomised virtual GBA cohort
To investigate GRS variance within a stratified genetic subpopulation, we created a virtual cohort of GBA variant carriers using effect estimates and European population frequencies. This virtual model considers three known GBA variants with estimates of effect size. Variance analysis of the virtual GBA cohort revealed patterns similar to the larger cohort, despite GRS being comprised only three GBA variants. Analysis of differences in mean GRS between arms showed the same higher quantities of variance at small sample sizes. Real GBA cohorts that possess a wider range of variants both within and outside the gene are likely to display greater differences between trial arms (table 5, figure 2B). Results for difference in variance in GBA GRS and the balanced GBA cohort are located in the online supplementary material (online supplementary table 2, online supplementary table 5). Genetic variance should be considered during the design of a trial even for studies created for specific mutation carriers.
Our simulation demonstrates that randomisation without efforts to improve genetic imbalance will result in significant variance between treatment and placebo arms in the vast majority of trials, of either a single SNP, multiple SNPs or GRS. As differences in GRS and allele carrier status can lead to differences in phenotypic read-out, controlling this source of variance would lead to better balanced trial arms that could improve clinical trial results, depending on the genetic contribution to phenotype presentation. Virtual cohort analysis revealed that even when performing gene-specific stratification, different variants within a single gene can create large sources of variance between arms. Genetic balancing should be performed even in trials using small subgroups of variant carriers, such as GBA variant carriers, to mitigate the varying effects that different variants within and outside the targeted gene can have on disease presentation, and thus, trial outcome. As mentioned above, GRS has been found to be significantly associated with progression time to H&Y stage 3, with 1 SD increase in total polygenic modelling GRS resulting in a 1.29 increase in HR.5 This finding, along with the results from Latourelle et al,9 suggest that it is likely there are many more unknown associations between variants/GRS and progression in PD phenotypes. Given 47 variants with the possibility to affect phenotypic read-out, there is only roughly a 10% chance that any given trial will have no differences in variant distributions between placebo and treatment arms. While one or two patients with rare variants will not affect trial outcomes, an imbalanced group of large-effect rare variant carriers could skew results, particularly as misdiagnosis rate for neurodegenerative diseases is high.
Simulation analysis of a variant interaction effect on MDS-UPDRS showed how clinical trial outcomes could be affected by unbalanced genetic variants with influence on phenotype progression. For small sample sizes, these effects are especially noticeable. As an example, considering the aforementioned 413 failed AD trials that took place between 2002 and 2012 and the simulated 2-year failure rate of 21.2% from the SNP interaction effect at n=50, if the smaller trials had contained disparities in measurable disease outcome similar to the effects of rs9298897 and rs17710829 in PD, roughly 87 of those trials may have failed due to genetic disease disparity. In addition, the interaction of the variants used in this simulation is quite rare, thus a similar effect size for more common SNPs and interactions will likely lead to higher rates of both false-positive and false-negative iterations. As this analysis was only based on the interaction between two SNPs, these results will vary in real clinical trials that hold more variability in terms of disease progression and genetic influences on phenotype, since variants in genes such as GBA, MAPT, SNCA and others may also affect PD progression.6 7 Further, reducing heterogeneity allows for a reduction in sample size, cutting trial costs. This has been shown to be true in studies such as Stone et al,32 where they found that genotyping for Apolipoprotein E (APOE) in AD trials resulted in an increase of power if sample sizes remained the same or allowed for the number of study sites to be reduced without decreasing power.
While we have mainly discussed the effects of genetic variance in terms of clinical trial failure, differences in GRS between arms can also create a positive or negative bias for a drug. An important goal in a clinical trial is to determine if any witnessed benefits are a true drug effect, but variance in underlying genetics may cause false conclusions to be made. A genetic effect on progression could be construed as the effect of the drug, when in fact, this is an example of ‘collider bias’ or ‘selection bias’.33 This effect can occur in the opposite direction as well, such as in the case of an imbalance in carriers of the LRRK2 G2019S, a mutation which has recently been connected to a slower rate of decline in motor function than in those without the mutation.34 Slower progression rates in the treatment arm could lead to false positives when the drug is ineffective. Balancing trials by GRS and allele distribution would control possible genetic bias that could lead to a false effect being classified as the effect of the tested drug.
Another source of variance that was not addressed in the current analysis is genetic factors that may affect the metabolism of the drug itself. These will add to the overall genetic variance and are harder to take into account when testing a new drug. In case of such variance, post-trial analysis of the treatment group can identify such variants, and a statistical correction could be applied based on the effects of such variants. A major limitation of such an approach will be that a relatively large studies, or meta-analysis of studies, will be required.
Therefore, pretrial genotyping of the patients using standard SNP genotyping arrays that contain the known PD-related SNPs, followed by a controlled randomisation process that will balance the trial arms as we done here, can be highly cost-effective. Genotyping the NeuroX cohort used in the first part of this analysis costs roughly US$50–US$60 per sample. For smaller sample sizes where genetic imbalance can make the most difference, this would mean an added cost of only US$10 000 to US$20 000. Considering the total cost of clinical trials, this is a small price to pay to reduce any possible negative effects genetic imbalance may contribute to trial outcome. With patient genotypes available, we suggest that at the minimum, balancing of known disease risk variants should be performed. On enrolment, patients can be assigned to the trial arm that best balances overall variation, using a simple algorithm. We provide an example of this in the online supplementary material, where we use an algorithm (freely available on request) designed to balance trial arms by genotype. We cannot afford to wait until we understand the exact effect of all human genetic variations on disease aetiology and drug response, and thus with the current knowledge that genetic variation creates disease variation and possible imbalance between trial arms, we should design clinical trials to the best of our ability.
While we have demonstrated the importance of genetic balancing across trial arms, an important caveat to consider is that we can only control variance to the extent of what is known by current heritability estimates. In addition to this study demonstrating the importance of genetic balance in clinical trials, it is also a call for larger scale genetic studies on progression that will allow us to account for and balance currently unknown sources of variance. Valuable future work would be to further investigate the effects of variants on phenotypic outcomes that are measured by clinical trials, such as change in UPDRS, to gain greater understanding of sources of variance in trial outcomes that can be controlled.
Refer for a complete overview of IPDGC members and acknowledgements http://pdgenetics.org/partners.
MAN and ZG-O contributed equally.
Contributors MAN and ZG-O conceived the study. MAN, ZG-O and HL contributed to study design. MAN and HL contributed to data analysis. MAN, ZG-O and HL contributed to data interpretation. MAN and CB contributed to data management and storage. HL drafted the manuscript and MAN, ZG-O, CB, LK, FF, HI, GF, AGD-W, DJS, IPDGC and ABS performed critical review and additional writing. All authors gave approval for publication.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests ZG-O reports personal fees from Sanofi/Genzyme, Lysosomal Therapeutics, Idorsia, Denali, Prevail Therapeutics, Ventus, Deerfield and Allergan. ZG-O is also supported by the Fonds de recherche du Québec-Santé (FRQS) Chercheurs-boursiers award given in collaboration with Parkinson Quebec, and is a Parkinson Canada New Investigator awardee. AGD-W reports personal fees from Merck and Co and other from Biogen. DJS reports funding from Merck and Co.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data are available in a public, open access repository.