Article Text
Statistics from Altmetric.com
Genetic susceptibility to the autoimmune B cell destruction that leads to Type 1 diabetes mellitus (T1D) is a complex trait.1 In recent years, many T1D associations have been reported, but only three (major histocompatibility complex, insulin, and cytotoxic T lymphocyte associated protein 4) have been confirmed in several independent studies.2,3 Independent confirmation is essential to eliminate artefacts of publication bias, multiple hypothesis testing, and, in findings of case–control studies, population stratification.4
Bottini et al recently found, by a case–control design in two independent populations, a novel association of T1D with a single nucleotide polymorphism (SNP) that caused a R620W aminoacid substitution (dbSNP rs2476601) in the lymphoid protein tyrosine phosphatase, non-receptor type 22 (PTPN22) gene.5PTPN22 encodes LYP, a non-receptor tyrosine phosphatase involved in lymphocyte function.
This paper leaves two potential questions unanswered. Firstly, results of case–control studies are potentially artefacts of population stratification, no matter how well matched are the two groups. Secondly, although Bottini et al show that the T allele encodes a protein unable to bind to its important Csk partner, and postulate this as a very attractive candidate mechanism for the genetic effect, they did not address the question of the haplotype structure of the locus and the possibility that the association is due to linkage disequilibrium (LD) with another variant. Here we present the results of a study that confirms this association in a design impervious to population stratification, and in a preliminary step towards addressing the second question, we define the LD block that encompasses the PTPN22-R620W SNP and present a computationally generated list of potentially functional SNPs within the block.
MATERIALS AND METHODS
Subjects
Genomic DNA was obtained after informed consent from 588 nuclear families with at least one T1D affected child and two parents. The research ethics board of the Montreal Children’s Hospital approved the study. Most probands attended the diabetic clinic at the Montreal Children’s Hospital. Ethnic backgrounds were of mixed European descent, with the largest single group being of Quebec French-Canadian origin. All patients were diagnosed before the age of 18 years and required insulin treatment continuously from the time of diagnosis.
For the LD structure studies, we genotyped for PTPN22-R620W DNA from the 30 family trios from the Centre de l’Étude du Polymorphisme Humain (CEPH) used in the International HapMap Project6 and combined the results with genotype data from the HapMap website (www.hapmap.org) for LD block analysis.
Key points
-
Recently, an association between a functional R620W polymorphism in protein tyrosine phosphatase PTPN22 and type 1 diabetes has been found by a case–control study. Because results of case–control studies may potentially be artefacts of population stratification, replication study is essential.
-
To validate the association in a design free of population stratification and explore the possibility that the association is due to linkage disequilibrium (LD) with another variant.
-
R620W was genotyped in 588 white nuclear families with at least one affected child and in the 30 European families used in the International HapMap Project.
-
Highly significant transmission disequilibrium (p = 1.7×10−5) was observed, confirming the case–control study. However, R620W maps to a 293 kb LD block containing numerous polymorphisms, raising the possibility that other potentially functional polymorphisms may be responsible for the association with T1D instead of, or in addition to, R620W. A computationally generated list of potentially functional SNPs within the block was presented.
-
The newly discovered association of PTPN22 with T1D was confirmed. However, for the association within an extended LD block, pinpointing the functional variant may require further studies in populations with different haplotype structure.
Genotyping
The SNP was genotyped by the AcycloPrime-FP SNP detection kit. PCR primers, designed by Primer3 Webtool,7 and fluorescence polarisation (FP) probes are listed in table 1. Reagents for each reaction included 12 ng DNA, 2 mmol/l MgCl2, 25 μmol/l dNTPs, and 0.025 U of AmpliTaq Gold (Applied Biosystems, Foster City, CA, USA). Amplification primers were used at 100 nmol/l each (all concentrations are final). PCR was performed in a Dual 384 well GeneAmp PCR system 9700 in clear 384 well microplates (Greiner Labortechnik, Germany). PCR conditions were: 95°C for for 12 minutes; 5 cycles at 97°C for 30 seconds, 58°C for 30 seconds, and 72°C for 30 seconds, then 45 cycles at 95°C for 30 seconds, 58°C for 30 seconds, and 72°C for 30 seconds; and 72°C for 6 minutes in a final volume of 8 μl. Unincorporated primers and dNTPs were removed and final extension was performed in the same PCR system in black microplates (MJ Research, Waltham, MA, USA). The extension reaction was performed at 40 cycles at 95 °C for 10 seconds and then 55°C for 30 seconds. Final detection of the SNP was done by the Criterion Analyst HT System (Molecular Devices, Sunnyvale, CA, USA).
Primers and probes used for PTPN22 R620W genotyping
Statistics
Hardy-Weinberg equilibrium of parent genotype distribution was tested by Transposer software.8 Association was tested by the transmission disequilibrium test, using the Family Based Association Test (FBAT) software (www.biostat.harvard.edu/∼fbat/fbat.htm) under the default additive genetic model.9 Computation of transmission ratio and haplotype analysis was performed by Haploview software (www.broad.mit.edu/personal/jcbarret/haploview).
Computational prediction of SNP function
The NCBI dbSNP database (www.ncbi.nlm.nih.gov/SNP) and BLAST webtool (www.ncbi.nlm.nih.gov/BLAST), Celera refSNP database (www.celeradiscoverysystem.com), and Genomatix SNP analysis webtool (www.genomatix.de/) were used to perform this task. Candidate SNPs were first identified based on the evolutionary conservation of human–mouse sequence alignment (noted by Celera refSNP as HMCS). For SNPs not included in Celera refSNP, local conservation of a DNA region were predicted by at least 80% sequence identity in a 20 bp sliding window spanning the SNP site.10 Further function prediction of an amino acid substitution was based on the location in a functional domain of the protein, and the evolutionary distance of amino acid substitution (Gonnet score),11 and potential regulatory role was evaluated by predicted creation/abolition of known transcription factor response and other regulatory elements.
RESULTS AND DISCUSSION
Mendelian error was zero and parent genotypes were in Hardy-Weinberg equilibrium (p = 0.32). There was no discrepancy between the sense and antisense genotyping assays. A highly significant (p = 1.7×10−5) excess transmission of the A allele (T on the sense strand) from heterozygous parents to affected children (transmitted 131 times, not transmitted 79 times) confirmed the previously reported excess of this allele in affected individuals compared to normal controls (table 2). From the genotype association analysis, we can see that the T1D association is independent of genetic model. The highly significant association also exists under both dominant and recessive genetic models.
Association analysis between rs2476601 and T1D
To address the question of other potentially functional genetic variants in LD with this SNP, DNA from the 30 CEPH family trios whose SNP genotypes at 5 kb resolution are publicly available (International HapMap Project, 10th release, July 2004) was genotyped for PTPN22-R620W. Combined LD analysis of the R620W results along with other SNPs at that locus was performed stepwise to make sure that all marker SNPs in a LD block were included. The result shows that PTPN22-R620W maps to a 293 kb block of 41 marker SNPs with a solid spine of LD (fig 1). The D′ value between the SNPs that define the two ends is 1.0, suggesting that the genetic effect could be caused (or contributed to) by another SNP located anywhere in that segment.
PTPN22 R620W polymorphism rs2476601 (underlined in blue) maps to a solid LD block. The genotyping data, except for rs2476601, are from the 10th release data of International HapMap Project. The haplotype map is made by Haploview v2.03 software. D′ values (%) are shown in the boxes. D′ = 100% for the empty boxes. High D′/low LOD can be seen when there is no/very little recombination evidence between two SNPs, but one SNP is much rarer than the other.
At least six other known genes map to this LD block besides PTPN22: the round spermatid basic protein 1 (RSBN1, also known as FLJ11220), putative homeodomain transcription factor 1 (PHTF1), the 3′ part of the membrane associated guanylate kinase related gene (MAGI-3), LOC440603 (similar to Gm566 protein), adaptor related protein complex 4, beta 1 subunit (AP4B1), and the 5′ part of DNA cross link repair 1B (DCLRE1B). The block contains 625 SNPs listed in NCBI dbSNP build 122; eight of these are non-synonymous (ns). Besides rs2476601 (R620W), only one nsSNP, rs1217401, maps to exon 10 of AP4B1 and has an appreciable reported minor allele frequency (0.380) at NCBI dbSNP. This nsSNP is also predicted to have potential functional effects (table 3). However, disease association may be due to non-coding SNPs that affect expression levels, as is the case in two of the three established T1D associations with insulin12 and CTLA4.13 Among the 625 SNPs, 20 (including six nsSNPs) show high human–mouse conservation (an indication of functional importance) and potential regulatory effects (table 3). Of these 20 SNPs, six (rs3789597, rs2273757, rs3789600, rs1217418, rs3789613, and rs3761936), are included in the present HapMap. Except for rs2273757, which is nonpolymorphic in 30 CEPH family trios; the other five SNPs have D′ = 1 with R620W. R2 values, a better indicator of whether one SNP may account for an effect observed in another, were 0.06, 0.16, 0.16, 0.26, and 0.58, respectively.
Potential functional SNPs in the LD block around PTPN22*
In addition to the strength of the LD, the extent to which one SNP may account for the genetic effect of another in the same block depends on allele frequencies. In this respect, rs6679677 is of particular interest among available HapMap SNPs, because its allele frequency is identical to that of PTPN22-R620W (R2 = 1), which makes it impossible to distinguish the effects of these two SNPs on the basis of genetic data alone. This SNP is located in the intergenic region between PHTF1 and FLJ11220, in a sequence with no human/mouse conservation. The Genomatix SNP analysis web tool predicts the destruction of a binding site of promoter CCAAT binding factors, the destruction of a site of enhancer CCAAT binding factors, and the generation of a binding site of the transcription factor Sox-5. Because of the limited resolution of HapMap, most of the 625 known SNPs in the block are not included in the database and it is therefore not known how many more may be in complete or near complete LD (R2 close to 1) with PTPN22-R620W.
Thus it cannot be stated with absolute certainty that PTPN22-R620W is the functional variant responsible (or solely responsible) for the genetic effect. However, a very compelling case can be made for PTPN22 and its variant R620W on functional grounds. LYP is a well established suppressor of T cell activation,14 and targeted disruption of PEP, its mouse orthologue, results in significant enhancement of memory T cell numbers.15 Moreover, disruption of binding to Csk with an induced mutation that mimics the effect of R620W abolishes the inhibitory effect of PEP on TCR signalling.16
The newly discovered association of PTPN22 with T1D and, more recently, other autoimmune disorders17,18 highlights an issue that is likely to become common in the elucidation of complex disorders: once association with an extended LD block is established, pinpointing the functional variant may require, in addition to functional studies of the type reported by Bottini et al, genetic studies in populations with different haplotype structure.19 Ultimate proof may require the generation of animal models that carry (or exactly mimic) the effects of the human polymorphism.
Acknowledgments
This work was funded by Genome Canada and the Juvenile Diabetes Research Foundation International. We thank R Grabs, F Bacot and R Fréchette for genotyping. Genotyping was performed using the facilities of the McGill University/Genome Quebec Innovation Center, with advice and help of A Sammak and Y Renaud. T J Hudson is supported by a Clinician Scientist Award in Translational Research by the Burroughs Wellcome Fund and an Investigator Award from the Canadian Institutes of Health Research.
REFERENCES
Footnotes
-
Competing interests: none declared