Article Text
Abstract
Objective A recent large-scale study in multiple sclerosis (MS) using the ImmunoChip platform reported on 11 loci that showed suggestive genetic association with MS. Additional data in sufficiently sized and independent data sets are needed to assess whether these loci represent genuine MS risk factors.
Methods The lead SNPs of all 11 loci were genotyped in 10 796 MS cases and 10 793 controls from Germany, Spain, France, the Netherlands, Austria and Russia, that were independent from the previously reported cohorts. Association analyses were performed using logistic regression based on an additive model. Summary effect size estimates were calculated using fixed-effect meta-analysis.
Results Seven of the 11 tested SNPs showed significant association with MS susceptibility in the 21 589 individuals analysed here. Meta-analysis across our and previously published MS case-control data (total sample size n=101 683) revealed novel genome-wide significant association with MS susceptibility (p<5×10−8) for all seven variants. This included SNPs in or near LOC100506457 (rs1534422, p=4.03×10−12), CD28 (rs6435203, p=1.35×10−9), LPP (rs4686953, p=3.35×10−8), ETS1 (rs3809006, p=7.74×10−9), DLEU1 (rs806349, p=8.14×10−12), LPIN3 (rs6072343, p=7.16×10−12) and IFNGR2 (rs9808753, p=4.40×10−10). Cis expression quantitative locus effects were observed in silico for rs6435203 on CD28 and for rs9808753 on several immunologically relevant genes in the IFNGR2 locus.
Conclusions This study adds seven loci to the list of genuine MS genetic risk factors and further extends the list of established loci shared across autoimmune diseases.
- Genetics
- Immunology (including allergy)
- Multiple sclerosis
- Neurology
Statistics from Altmetric.com
Introduction
Multiple sclerosis (MS) is the most common autoinflammatory disease of the central nervous system. It is caused by the action and interaction of genetic and environmental factors. Genome-wide association studies (GWAS) and other large-scale genotyping projects have revealed that only few common genetic variants exist that exert relatively large effects (ie, OR ranging from ∼1.3 to 3), all of which are located in the HLA (human leucocyte antigen) locus (reviewed in ref. 1). The remainder of the genetic risk spectrum is likely determined by a large number of susceptibility variants exerting much smaller effects. The hitherto completed GWAS and follow-up projects have identified 110 independent SNPs outside the HLA locus showing genome-wide significant association (p<5×10−8) with MS risk (eg, refs 2–4). The most recent of these studies using the ImmunoChip (a customised genotyping array with extensive coverage of loci involved in immune system disorders, including MS5) in the discovery phase and previous GWAS data for validation of top results reported on 48 novel genome-wide significant (p<5×10−8) MS risk loci in a total of 29 300 MS cases and 50 794 controls.4 Eleven additional loci showed suggestive evidence for association in the combined data (p<1×10−6) but failed to surpass the genome-wide significance threshold.4 To address the question whether and which of these suggestive loci possibly represent genuine MS susceptibility factors, we genotyped the most significant SNP originally highlighted4 in each of the 11 loci (table 1) using an independent, multicentric case-control data set of 10 796 MS cases and 10 793 controls. This included 18 335 individuals of European descent from Austria, France, Germany, the Netherlands and Spain, as well as 3254 subjects from Russia (see online supplement 1: table e-1). None of these data sets were previously analysed for the 11 SNPs under scrutiny here. Association results from all 21 589 individuals were subsequently combined with those from the ImmunoChip study,4 amounting to a total of 101 683 subjects, the largest data set collectively analysed in MS genetics to date.
Methods
Subjects
The effective sample size available for analysis after quality control (QC) comprised 21 589 individuals including 9079 MS cases and 9256 unrelated controls of European descent from Austria, France, Germany, the Netherlands and Spain, as well as 1717 cases and 1537 controls from three regions in Russia (see online supplement 1: table e-1) and have been described previously.3 ,6 ,7 Diagnosis of MS was established according to standard diagnostic criteria.8 ,9 All samples were collected with informed written consent. None of the samples tested here were included in the previous ImmunoChip study.
Power analyses
Power analyses were performed using the Genetic Power Calculator10 assuming a disease prevalence of 0.1% and no between-study heterogeneity. The combined effective validation data sets of 10 796 cases and 10 793 unrelated controls had >80% power to detect an OR of 1.10 at a one-sided type-1 error rate of α=4.5×10−3 and down to allele frequencies of 0.16. The combined data sets of the original4 and of this study across 101 683 individuals had >80% power to detect an OR of 1.10 at a two-sided genome-wide type-1 error rate of α=5×10−8 down to allele frequencies of 0.10.
Genotyping and quality control
Genotyping for all samples except the Dutch data set (see below) was performed at the individual sites using single-assay allelic discrimination assays based on TaqMan chemistry following the manufacturer's instructions (Life Technologies). Taqman assays and reagents were ordered centrally by the Berlin site and either used there or distributed to all participating centres. The three Russian data sets were all genotyped at the site in Novosibirsk, Russia, and all German samples were genotyped in Berlin. Up to 5% of samples were genotyped in duplicate across plates to assess genotyping accuracy. Genotypes in the Dutch sample were generated on the Human610-Quad Bead GWAS array (Illumina) and subjected to standard QC using the GenABEL package in R (http://www.genabel.org/packages/GenABEL). This entailed excluding samples showing ≥2% missingness, cryptic relatedness, and excess heterozygosity as well as SNPs showing ≥2% missingness, or deviation from Hardy-Weinberg equilibrium (HWE) at p>1×10−6. Principal component analyses did not reveal ethnic outliers when plotting Dutch and CEU HapMap samples together. For all data sets included in this study, individuals with missing genotypes for more than three SNPs were excluded prior to analysis (applicable to a total of 193 samples (∼0.9%) across all data sets). The threshold for genotyping efficiency per SNP and data set was set to >95%. SNPs falling under this threshold were rs6072343 (in the Russia Novosibirsk and Yakutsk data sets), rs727098 and rs9808753 (Russia-Yakutsk). HWE was assessed in controls and deviations from HWE were defined as p<4.5×10−3 in each data set (ie, applying a Bonferroni correction for 11 tests) based on Pearson's χ2 as implemented in PLINK V.1.07,11 which led to the exclusion of rs806349 from the French data set. All other SNPs passed these QC thresholds and were included in the actual association analyses. Genotyping accuracy based on comparison with duplicated samples was >99.8% and total genotyping efficiency was >98% for each SNP after QC.
Association analyses
Association analyses were performed using PLINK and in the R environment and were based on logistic regression using an additive genetic model. The Dutch data set was adjusted for the first four principal components. Based on recent recommendations for rare diseases including MS,12 the logistic regression analyses were not adjusted for known covariates such as age and sex. Meta-analyses across all data sets included here were based on fixed-effect models. The experiment-wide significance threshold was set to p<4.5×10−3 (ie, applying a Bonferroni correction for 11 tests). Between-study heterogeneity was quantified using the I2 metric, and statistical significance was assessed using the Q test statistic. Significant evidence for heterogeneity was defined as p<0.1, a threshold commonly used in this context.13 ,14 Forest plots were generated using a customised version of the ‘rmeta’ package in R.15 The data sets of the original study4 and this study were combined using fixed-effect meta-analysis. For this purpose, ancestry-adjusted summary ORs and 95% CIs of the discovery and replication data sets were extracted from the online supplementary table S2 of the original study.4 In addition, differences between the summary effect size estimates of the eight western European data sets and the three data sets from Russia were assessed by interaction analysis as previously described.16 For these analyses, significance was defined as p<4.5×10−3 (ie, Bonferroni-corrected for 11 tests). All meta-analysis p values of the validation data sets only are reported as one-sided with regards to the direction of effect reported in the original study.4 p Values for all other tests (ie, heterogeneity, the meta-analysis of the original study4 and this study, interaction analyses) are two-sided.
In silico fine-mapping of putative causal variants
We applied a recently developed in silico fine-mapping algorithm (the ‘probabilistic identification of causal SNPs’ approach (PICS), http://www.broadinstitute.org/pubs/finemapping/?q=pics)17 to all seven genome-wide significantly associated MS risk SNPs. PICS was developed based on ImmunoChip data to estimate a SNP's probability of being ‘causal’ in densely mapped genotyping data based on the association result of the most significant SNP.17
Results
Among the 11 candidate SNPs assessed in 21 589 individuals across 11 data sets, six showed significant association with MS susceptibility after Bonferroni correction (ie, pcorr<4.5×10−3, table 1, figure 1). These six SNPs were: rs1534422 in LOC100506457 (OR=1.10, p=1.39×10−6), rs6435203 downstream of CD28 (OR=1.08, p=7.20×10−4), rs3809006 in ETS1 (OR=1.07, p=1.26×10−3), rs806349 in DLEU1 (OR=1.12, p=9.80×10−8), rs6072343 upstream of LPIN3 (OR=1.14, p=1.14×10−5) and rs9808753 in IFNGR2 (OR=1.12, p=2.62×10−4). In addition, rs4686953 in LPP (OR=1.05, p=6.00×10−3) missed the threshold for experiment-wide multiple testing correction (pcorr<4.5×10−3) by a small margin. All of these seven loci showed genome-wide significant (p<5×10−8) association with MS risk upon meta-analysis of all available data, that is, after combining the results from the original study4 and from the 11 independent data sets tested here, amounting to a total of 101 683 individuals (table 1).
Furthermore, while the effect size estimates of the four remaining loci showed non-significant effects in the combined validation data sets after multiple testing correction (with p values ranging from 0.254 to 0.0444, table 1, see online supplement 1: figure e-1), all pointed into the same direction of effect as in the original study (4) (p=0.0625 based on the underlying binomial distribution) with ORs ranging from 1.02 to 1.06. The corresponding 95% CIs of the ORs computed here included the effect size estimates of the original study4 in all instances. However, none of the four variants reached genome-wide significance upon meta-analysis across all 101 683 available individuals (table 1).
For two of the 11 tested SNPs, there was weak evidence for between-study heterogeneity in the validation data, that is, rs806349 in DLEU1 (pQ=0.0210) and rs3809006 in ETS1 (pQ=0.0978; table 1). Accordingly, the 95% CIs of the I2 estimates were large in both instances (ie, 6–77 for rs806349 and 0–69 for rs3809006). The heterogeneity for rs806349 was primarily due to heterogeneity of effect size estimates pointing into the same direction of effect rather than heterogeneity of ORs on either side of the null (figure 1E) and for rs3809006, this was due to the single outlying estimate of the Russian-Novosibirsk study pointing into the opposite direction of effect than the majority of data sets (I2=0 (95% CI 0 to 46), pQ=0.709 upon exclusion of this data set). Finally, interaction analyses of the effect size estimates of the stratified western European and Russian data sets did not yield any significant differences after correction for 11 tests (pcorr<4.5×10−3, see online supplement 1: table e-2). The only SNP approaching this threshold was rs3809006 in ETS1 (pinteraction=9.7×10−3) showing ORs of 1.09 (95% CI 1.04 to 1.14) in the western European and 0.94 (95% CI 0.85 to 1.04) in the Russian data sets (which was due to the single outlying effect estimate from the Russia-Novosibirsk data set as described above).
To pinpoint the putative causal variant(s) underlying the newly identified association signals we applied the PICS algorithm17 to all seven SNPs showing genome-wide significant association with MS. This revealed a number of variants (with r2≥0.8 to the index SNPs) that had 80% or more cumulative probability of including the causal variant per locus (median number of SNPs per locus 16, range 5–31; see online supplement 2). Interestingly, in all seven instances the index SNPs showed the highest probability of representing the causal variant (median probability=23.3%, range 7.2–29.1%, see online supplement 2). Gene ontology (using category ‘biological process’) terms and/or Biocarta pathways were available for four (ie, CD28, LPP, IFNGR2 and ETS1) of the nearest genes for all seven loci. Three of these (ie, all except LPP to which generally only few gene ontology terms could be attributed) suggested a functional involvement in processes related to the immune system (see online supplement 1: table e-3).
Discussion
Our study represents the first to show genome-wide significant association between risk for MS and common genetic variants in or near LOC100506457, CD28, ETS1, DLEU1, LPIN3, IFNGR2 and LPP. All of these SNPs showed experiment-wide and/or genome-wide significant association in the large collection of newly analysed samples (21 589 subjects) and upon combining our data with those from previous efforts4 (resulting in a total of 101 683 subjects), respectively. Overall, these results provide compelling evidence that the highlighted loci represent genuine genetic risk factors for MS and should be considered in future genetic, functional and clinical studies. Moreover, our study represents one of the few MS genetic association studies in data sets from Russia. Our analyses show that effect size estimates in this population are not substantially different from those from western European data sets, at least not for the 11 SNPs that were analysed here.
In addition to considerably extending the list of established genetic risk factors for MS, our study also represents an important step forward in extending the list of genetic factors shared among autoimmune diseases. This is based on the observation that six of the seven SNPs showing genome-wide significant association with MS here also show considerable evidence for association with several other autoimmune diseases, for example, coeliac disease, autoimmune thyroiditis and type 1 diabetes (table 2). This observation is in line with previous reports also suggesting putative involvements of the then established MS risk loci in other autoimmune diseases (eg, refs 1 ,4 ,18). This entirely independent evidence provides further indirect support for a genuine involvement of these loci in MS pathogenesis. Interestingly, the direction of effect was not always the same as for MS. This was most noticeable for CD28 where the effect direction was inverse in MS compared with all other four autoimmune diseases (table 2). Such inverse effect directions across autoimmune diseases have been described previously.18 ,19 Furthermore, for CD28, LPP and ETS1 several SNPs beyond the index SNP show genome-wide significant association with coeliac disease, autoimmune thyroiditis and/or type 1 diabetes. In all cases, the association of the non-index SNPs was statistically stronger than the association of the index SNP possibly suggesting allelic heterogeneity across the disease entities assessed here (table 2).
Despite the compelling accumulated genetic evidence for 7 out of the 11 loci tested and highlighted here, there are several potential limitations to our study. First, despite having examined a sample size of over 21 500 individuals in the data sets newly genotyped here and over 101 500 subjects in the analyses combining our data with those from the ImmunoChip study,4 power was still limited to detect very small effect sizes (ie, ORs<1.10), especially for minor allele frequencies below 0.20. In this context it is noteworthy that the effect size estimates of the four loci that did not reach genome-wide significance in the combined meta-analyses all pointed into the same direction in the validation data sets when compared with the original study.4 Thus, based on the currently available data, we cannot exclude the possibility that some or all of the loci currently not displaying genome-wide significant evidence for association with MS risk may eventually also prove to be genuine disease loci once tested in even larger data sets. Second, this is the first study to examine the 11 putative MS risk loci of interest here in data sets of Russian origin. While to our knowledge we examined all MS case-control data sets currently available from Russia, their combined size of 3254 individuals is still comparatively small. Thus, these results need to be interpreted carefully. This also includes the analyses comparing effect size estimates between these and western European samples: currently there is—with the potential exception of rs3809006 in ETS1—no strong evidence suggesting that the genetic effects exerted by these 11 SNPs are different in the Russian population when compared with the western European population, but larger data sets are needed to assess this question more thoroughly. In this context, it can also not be excluded that population substructure may have affected some of the association results. However, genome-wide SNP data to assess and adjust for population substructure in the Russian data sets tested here are currently not available. This also applies to most of the western European data sets as only the Dutch data set was adjusted for ethnic substructure. However, the Dutch effect size estimates were similar to the remaining western European data sets suggesting that undetected population stratification may not have substantially biased the results. In addition, the western European data sets have been used previously in multiple similar studies nearly always resulting in association findings in line with those from independent GWAS (eg, refs 3 ,20 ,21). Furthermore, none of the meta-analysis results was driven by a single outlying study effect estimate. Thus, the presence of type 1 error in our results due to uncorrected population stratification would imply the same direction of bias for the majority of data sets analysed. This would include the validation data sets tested here as well as those included in the original study,4 the latter of which were all adjusted for population stratification. It appears rather unlikely that such a widespread bias would be present in the majority of all independently recruited data sets from regionally distinct populations. However, we cannot exclude that some results in our validation study were affected by bias towards the null due to subtle population stratification.
Finally, it is well known that the lead SNPs emerging from GWAS are not necessarily the variants exerting the pathogenic functional effects. To this end, we used PICS to generate a list of potentially causal variants (see online supplement 2). These variants were further assessed regarding their potential functional impact by calculating the scaled ‘Combined Annotation Dependent Depletion’ (CADD; http://cadd.gs.washington.edu/) score,22 by annotating variants based on the Encyclopedia of DNA Elements (ENCODE) project using HaploReg V.3 (http://www.broadinstitute.org/mammals/haploreg/haploreg_v3.php),23 and by interrogating a recently described large-scale expression quantitative trait locus (eQTL) database (http://genenetwork.nl/bloodeqtlbrowser/)24 with information on transcriptome-wide microarray-based expression data from 5311 blood samples.24 Possibly the most interesting result was obtained with SNP rs3809006, located intronically in ETS1, which carried the highest PICS probability of all index SNPs (29.1%) and at the same time showed one of the highest CADD scores (10.3). CADD scores >10 are among the top 10% of potentially pathogenic variants of all theoretically possible 8.6 billion single nucleotide exchanges in the human genome. A number of SNPs in linkage disequilibrium with the seven index SNPs also showed CADD scores >10, some of which were located in proximity to promoter or enhancer histone marks in the thymus, blood or brain (see online supplement 2); for instance, this included several SNPs located 3′ downstream of CD28. Notably, genetic association—albeit at subgenome-wide significance—between MS and CD28 had already been reported in the candidate-gene era owing to this gene's involvement in T cell activation.25 Interestingly, CD28 index SNP rs6435203 highly significantly correlated with the expression of CD28 (p=1.15×10−15, see online supplement 1: table e-4). Another of the novel genome-wide MS SNPs identified here is rs9808753, which is located in a region on chromosome 21q22.11 characterised by the presence of several genes involved in the immune system response. In the eQTL database rs9808753 was found to correlate with the expression of several of these genes, including IL10RB (p=3.22×10−14), IFNAR1 (p=2.57×10−7), TMEM50B (p=3.55×10−7) and IFNAR2 (p=9.77×10−4; see online supplement 1: table e-4). Online table e-5 (supplement 1) highlights previously described functional implications of these eQTL genes potentially underlying autoimmune disease pathophysiology and progression. Clearly, further experimental work is needed to characterise the potential functional impact of these and the other MS loci newly nominated here.
In conclusion, our study is the first to establish genome-wide significant association between risk for MS and seven new loci, of which several show strong association with other autoimmune diseases. Using previously generated transcriptome data, we observed strong eQTL effects for the MS-associated SNPs in CD28 and IFNGR2. Further fine-mapping and functional studies are required to elucidate mechanisms underlying the newly highlighted disease associations.
Acknowledgments
The authors thank ICM, Généthon, for their help and support. The authors also thank Drs Elena G Arefyeva, Svetlana A Elchaninova, Fedor A Platonov, Ilja Demuth, Elisabeth Steinhagen-Thiessen and Ulman Lindenberger for recruiting individuals included in this study.
References
Footnotes
Contributors Study concept: CML and LB. Study supervision or coordination: CML, FL, IC-R, RH, AZ, MC, BF, EU, KV, MF, FM, FZ and LB. Acquisition of data: CML, FL, AA, EAS, NU, BdlH, LG-N, SM, ER, B-MMS, JYM, AM, IW, DAA, OA, IAl, AA, RA, IAs, PB, ANB, MB, AC, TD, JTE, OOF, MF, OF, AG-M, L-AG, CG, H-PH, SH, GI, DSK, AK, CK, TK, LL, PL, NAM, XM, EVP, PR, ASR, CS, IVS, EYT, AW, UKZ, IC-R, RH, AZ, MC, BF, EU, KV, MF, FM and FZ. Statistical analysis: CML. Analysis and interpretation of data: CML, FL, HB, FM and LB. Drafting the manuscript: CML and LB. Critical revision of the manuscript for content: FL, AA, EAS, NU, BdlH, LG-N, SM, ER, B-MMS, JYM, AM, IW, DAA, OA, IAl, AA, RA, IAs, PB, ANB, MB, AC, TD, JTE, OOF, MF, OF, AG-M, L-AG, CG, H-PH, SH, GI, DSK, AK, CK, TK, LL, PL, NAM, XM, EVP, PR, ASR, CS, IVS, EYT, AW, UKZ, HB, IC-R, RH, AZ, MC, BF, EU, KV, MF, FM and FZ.
Funding The German Ministry for Education and Research (grant 16SV5538 to LB, KKNMS to FZ, NBL3 to UKZ, 01GM1203A to H-PH and OA), the Johannes Gutenberg University Mainz (grants MAIFOR and “Inneruniversitäre Forschungsförderung Stufe I” to FL), the Walter- and Ilse-Rose-Stiftung (to H-PH and OA), the Instituto de Salud Carlos III-Fondo Europeo de Desarrollo Regional (Feder), the Fondo de Investigaciones Sanitarias FIS (grant numbers P12/00555 to FM, PI13/01527 to AA, PI13/01466 to GI] and Junta de Andalucía (JA)- Fondos Europeos de Desarrollo Regional (FEDER) [grant number CTS2704 to FM), (PI13/0879) and Fundación Alicia Koplowitz (to EU), INSERM, ARSEP, AFM and ANR-10-IAIHU-06 (to BF), “REEM: Red Española de Esclerosis Múltiple” (RETICS-REEM RD12/0032/009, http://www.reem.es, to AG-M), “Fondo de Investigaciones Sanitarias” (FI11/00560 to BdlH), the Dutch MS Research Foundation (to RH), the grant Ayudas para Grupos de Investigación del Sistema Universitario Vasco—Gobierno Vasco (Ref. IT512-10 to KV), the Russian Foundation for Basic Research (13-04-40281-Н and 15-04-04866 to ANB and EVP) and the Russian Science Foundation (14-14-00605 to EYT). The funding sources did not have an influence on the design and conduct of the study; the collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript; or the decision to submit the manuscript for publication. EU works for the Fundación para la Investigación Biomédica-Hospital Clínico San Carlos- IdISSC. Samples and data from Northern Spanish patients with MS were processed by the Basque BioBank for Research-OEHUN (http://www.biobanco.vasco). The French DNA samples were provided by the BRC-REFGENSEP.
Competing interests None declared.
Ethics approval Local Ethics Committees.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement All analysis results have been made available in this manuscript.