Statistics from Altmetric.com
- AIMs, ancestry informative markers
- LD, linkage disequilibrium
- MCMC simulation, Markov chain Monte Carlo simulation
- SLVDS, San Luis Valley Diabetes Study
The prevalence of type 2 diabetes is higher in populations of Native American ancestry, and in Hispanic American populations formed by admixture between Europeans and Native Americans, than in populations of European ancestry.1 One approach to distinguishing between environmental and genetic explanations for this difference is to study the relationship of type 2 diabetes risk to individual admixture proportions (the proportions of an individual’s genome that are of European and Native American ancestry). With only a few markers informative for ancestry, it is possible to estimate the average admixture proportions of any Hispanic American population. In such analyses, it has been possible to demonstrate that the prevalence of type 2 diabetes in Hispanic Americans in the south western United States varies with the average Native American admixture proportion of these populations.2–4 In the Native American population of Gila River, Arizona, USA, European admixture is associated with lower prevalence of type 2 diabetes.5 However, it has not been possible to demonstrate an association of type 2 diabetes with individual admixture proportions within an Hispanic American population. To estimate the admixture proportions of an individual accurately requires a larger panel of markers: at least 40 markers with average frequency differentials of 0.6 are required to estimate the admixture of an individual with a standard error of no more than 0.1.6 It is now possible to identify relatively large numbers of such ancestry informative markers from data accumulating in the public domain. For this study we typed a panel of 21 markers chosen to have large differences in frequency between European, Native American, and West African ancestry.
The possible relationship of type 2 diabetes risk to individual admixture proportions within Hispanic American populations complicates the interpretation of associations of type 2 diabetes with candidate gene polymorphisms within these populations. If admixture proportions vary between individuals (hidden population stratification) and the risk of type 2 diabetes varies with individual admixture proportions, this will confound allelic associations with type 2 diabetes at any loci where allele frequencies differ between Europeans and Native Americans. We have shown that in recently admixed populations associations are often observed between unlinked genetic markers.7–9 Thus, when carrying out association studies in admixed populations, it is necessary to control for possible confounding by population stratification. The classic approach to this has been to type parents as controls, but for a late onset disease such as type 2 diabetes parents of cases are not usually available for study. By typing ancestry informative markers, we can estimate individual admixture and control for it as a confounder. The most satisfactory approach to this is to fit a statistical model of population admixture, individual admixture, and the relationship of disease risk to individual admixture. Tests for allelic association with the disease can then be adjusted for the confounder. Although the statistical model is based on a straightforward application of the laws of mendelian genetics, to fit such a model in practice requires bayesian computationally intensive methods. We have developed a general purpose program (ADMIXMAP) for modelling admixture based on this approach, and have demonstrated the ability to distinguish associations of a trait with alleles at loci that are linked to a trait locus from associations with unlinked loci that are generated by population stratification.9,10 Where two or more loci in the same gene have been typed, the program can also model the unobserved haplotypes, given phase-unknown genotype data.
The prevalence of type 2 diabetes is higher in Hispanic American populations than in populations of European ancestry. The objectives of this study were to distinguish between genetic and environmental explanations for this ethnic difference in disease risk, and to test candidate gene polymorphisms for association with type 2 diabetes in the Hispanic American population of San Luis Valley, Colorado, USA.
We genotyped 11 single nucleotide polymorphisms in five candidate genes (CAPN10, GNB3, PPARG, and ABCC8/KCNJ11), together with a panel of 21 ancestry informative markers, in a sample of 261 controls and 185 diabetic individuals. The ADMIXMAP program was used to model the effects of admixture.
Type 2 diabetes risk varies with proportion of Native American ancestry in San Luis Valley Hispanics, but this relationship is confounded by socioeconomic factors. We were able to confirm modest effects of ABCC8/KCNJ11 variants on type 2 diabetes risk, but observed no association of type 2 diabetes with four CAPN10 markers.
In this paper, we evaluated the association of type 2 diabetes, fasting insulin, and body mass index (BMI) with polymorphisms in five candidate genes in a sample from the Hispanic American population of San Luis Valley, Colorado, USA. The candidate genes are calpain 10 (CAPN10), guanine nucleotide binding protein, beta polypeptide 3 (GNB3), peroxisome proliferative activated receptor, gamma (PPARG), ATP binding cassette, subfamily C, member 8 (ABCC8), and potassium inwardly rectifying channel, subfamily J, member 11 (KCNJ11). We analysed four markers within the CAPN10 gene (UCSNP 19, 43, 44, and 63 polymorphisms), one marker within the GNB3 gene (C825T polymorphism), two markers within the PPARG gene (Pro12Ala and exon 6 C→T polymorphisms), and four markers located within the ABCC8 (SUR1) and KCNJ11 genes, which are closely linked on chromosome 11 (ABCC8 exon 16 splice acceptor site, ABCC8 exon 31 G→A polymorphism, ABCC8 exon 33 G→T polymorphism, and KCNJ11-E32K polymorphism). In previous studies, associations of type 2 diabetes or related traits such as obesity and insulin resistance have been detected with these polymorphisms, or other sites in the same genes. However, with the exception of CAPN10, there have been few studies of these associations in Hispanic American populations.
The study sample was selected from participants of the San Luis Valley Diabetes Study (SLVDS), a geographically based study of the natural history, incidence, and risk factors for type 2 diabetes conducted in the counties of Alamosa and Conejos in southern Colorado. These counties are 43.6% Hispanic American. Informed consent from all participants and approval by the Institutional Review Board of the University of Colorado were obtained prior to data collection. Additional approval of this work was obtained through the Penn State University Institutional Review Board (ORC# 00M0453). The procedure for selecting the SLVDS study subjects has been described in detail by Hamman et al.11 In summary, persons with type 2 diabetes in the study area were identified through all health care facilities and through advertisement in local newspapers, presentations to local organisations, and local radio programs. Eligible subjects with a medical diagnosis of type 2 diabetes were 20–75 years of age, residents of the study area, mentally competent, and spoke either Spanish or English. The baseline data collection clinic (1984–1988) was attended by 82% of eligible subjects (n = 440). Controls were selected using a two stage sampling method. First, 57% of all occupied structures in the two county area were sampled and enumerated. Enumerated persons 20–75 years of age were the sampling frame for the second stage of control selection where subjects were randomly selected within age, sex, ethnic group, and county strata to match the age and sex distribution of persons with type 2 diabetes. Some 67% of eligible controls (n = 1351) attended the baseline clinic. The total sample described above comprised unrelated individuals of Hispanic ancestry and English or Anglo ancestry. We have focused our analyses only on the persons who identified as Hispanic (Mexican, Mexican American or Chicano, Spanish/Hispanic). The SLVDS Hispanic sample included 185 individuals with a history of diabetes confirmed by oral glucose tolerance test and 261 controls with confirmed normal glucose tolerance. A total of 128 participants reported their ethnicity (based on the 1980 US census question) as Mexican, Mexican American, or Chicano, and 318 as other Spanish/Hispanic. Data on other relevant phenotypes, such as fasting insulin and BMI, were also collected in the SLVDS. Data on household income and years of education were used to assess socioeconomic status. Age at baseline visit ranged between 21 and 75 years old. Given the relatively young age of some of the subjects, it was expected that some of the control subjects would develop type 2 diabetes as they grew older. As such our analysis is conservative with respect to testing for associations with either ancestry or locus specific tests. Table 1 summarises the characteristics of this Hispanic sample from San Luis Valley.
Allele frequencies in unadmixed Europeans were estimated from samples from Spain, Germany, Ireland, and Britain. Allele frequencies in unadmixed Native Americans were estimated from Mayans and from populations in the south western United States (Pima, Cheyenne, and Pueblo). Allele frequencies in West Africans were estimated from samples from Nigeria, Sierra Leone, and Central African Republic. More information on these unadmixed samples is available in dbSNP (http://www.ncbi.nlm.nih.gov/SNP/index.html), under the submitter handle PSU-ANTH.
We typed 11 polymorphisms within five candidate genes. The description of the markers in each gene, the primer sequences, and the PCR conditions are given in table 2. The sequence surrounding the polymorphisms of the Calpain-10 gene was kindly provided to us by Dr Nancy Cox. The polymorphism of UCSNP-19 is based on a difference of 32 bp between the two alternative alleles, and this marker was genotyped using conventional agarose electrophoresis. The characterisation of the remaining polymorphisms was based on the presence/absence of a restriction site. In some cases, due to the absence of a natural restriction site, primer sequences were modified to create a restriction site polymorphism (table 2). After initial denaturation for 5 min at 94°C, DNA samples were amplified at the denaturation/annealing/extension temperatures specified for each marker, followed by a final extension for 5 min at 72°C. After PCR, these markers were digested using the appropriate restriction enzyme, following the recommendations of the supplier. For details of the restriction enzymes used, see table 2. After digestion, genotypes were characterised by means of conventional agarose electrophoresis (SNP 43, SNP 44, and SNP 63), or alternatively, by melting curve analysis. Details of the melting curve analysis method employed have been described in a previous manuscript.12 For a subset of the samples, genotypes were characterised using both the McSNP method and conventional agarose electrophoresis, with consistent results.
A panel of 21 ancestry informative markers (AIMs) was also genotyped in the San Luis Valley sample. These markers show large differences in frequency between the parental populations (mainly European and Native American), and were used to control for the presence of genetic structure due to admixture. The panel of AIMs includes an Alu insertion polymorphism (PV92), four short insertion–deletion polymorphisms (MID-575, MID-52, MID-161, and MID-93), and 16 single nucleotide polymorphisms (SNPs). Relevant information about these markers is provided in table 3. We have reported in a recent manuscript the allele frequencies of these 21 AIMs in the sample from San Luis Valley and also in samples of the relevant parental populations.13
To test for association of the markers located within the type 2 diabetes candidate genes with the traits under study, we used the program ADMIXMAP10 (available at http://www.lshtm.ac.uk/eu/genetics/index.html#admix). This is a general purpose program for modelling population admixture with genotype and phenotype data, based on a combination of bayesian and classical methods. As we have described the statistical methods used in this program in detail previously10 and demonstrated their application to studies of skin pigmentation in this Hispanic American population sample, only an outline is given here. For this analysis, the Hispanic American population was modelled as formed by admixture between three subpopulations: European, Native American, and West African. The program fits a hierarchical model for the distribution of admixture proportions in the population, the admixture proportions of each parental gamete, and the ancestry of the gene copies at each locus. The variation between three states of ancestry on chromosomes of mixed descent is modelled as the outcome of three independent Poisson arrival processes. This requires only one extra parameter—the sum of the intensities of the arrival processes—to be specified in the model. Allele and haplotype frequencies are estimated by combining information from unadmixed and admixed population samples (using the posterior distribution of allele frequencies obtained from data on unadmixed individuals as a prior distribution for the corresponding ancestry specific allele frequencies in the admixed population). Where two or more SNPs in the same gene have been typed, these loci are grouped into a single “compound locus” and the program models the unobserved haplotypes, given the observed (phase unknown) genotypes at each compound locus. A generalised linear model is specified for the relation of the dependent variable (type 2 diabetes, insulin, or BMI) to individual admixture and other covariates such as age, sex, and socioeconomic variables. For type 2 diabetes, this is a logistic regression model. For insulin and BMI, this is a linear regression model. For fasting insulin, only those individuals who were classified as controls at baseline visit were included in the regression model.
The model is specified as a bayesian full probability model, in which all unobserved variables—such as haplotypes, ancestry states at each locus, gamete admixture proportions, and population level parameters—are “missing data”. Non-informative prior distributions are specified for the distribution of admixture proportions in the population, and for the parameters of the regression model. The posterior distribution of the missing data, given the observed data, is then generated by Markov chain Monte Carlo (MCMC) simulation. Inference about the parameters of the regression model is based on the posterior distribution. In large samples, the posterior means and 95% central posterior intervals (“95% credible intervals”) are asymptotically equivalent to maximum likelihood estimates and 95% confidence intervals (95% CI).
Score tests for allelic association with the trait, conditional on individual admixture and any other covariates, are constructed as described previously.10 The parameter tested is the coefficient β for the effect of the allele or haplotype under study (coded as 0, 1, or 2 copies) in a regression model that includes admixture and other covariates such as age and sex. For each SNP, a positive score value indicates association of the trait with the allele being tested. To test the null hypothesis that β = 0, the score (gradient of the log-likelihood) and the observed information (curvature of the log-likelihood) at β = 0 are calculated by averaging over the posterior distribution of the missing data (the haplotypes and individual admixture values). The score test correctly allows for uncertainty about haplotype assignments and estimation of individual admixture proportions, because it is based on the likelihood of the observed data as a function of the parameter (β) that is being tested.
The ratio of observed to complete information in the score test can be interpreted as a measure of the efficiency of the analysis, compared to a study design in which haplotypes have been observed directly and individual admixture proportions measured without error. Where an allele or haplotype is found to show significant association with the trait, it is possible to estimate the size of the effect of that allele by fitting a model in which the allele or haplotype (coded for each individual as 0, 1, or 2 copies) is included as an explanatory variable in the regression model. Inference is then based upon the posterior distribution of the regression coefficient. This, however, requires a separate run of the sampler for each hypothesis under test, whereas the score test procedure allows all loci and all haplotypes to be tested for association in a single run of the MCMC sampler. An approximation to the maximum likelihood estimate of the effect size (as the natural logarithm of the odds ratio, for a logistic regression model) can be obtained from the score test by dividing the score by the observed information. In large samples, this is asymptotically equivalent to computing the maximum likelihood estimate directly.
The fit of the observed genotype frequencies to Hardy-Weinberg proportions was estimated by a Fisher exact test. Linkage disequilibrium (LD) between markers was estimated using the 3LOCUS.PAS program, kindly provided by Dr Jeff Long. LD is expressed as the D′ coefficient, in which the observed gametic disequilibrium (D) is standardised by the theoretical maximum disequilibrium (Dmax).14
Fit of the genotype frequencies to the Hardy-Weinberg proportions
We tested independently in cases and controls if genotype frequencies deviate from the theoretical Hardy-Weinberg proportions (HW). We detected significant departures of HW in four markers in the sample of controls (FY-null, p = 0.01; GNB3, p = 0.009; MID-161, p = 0.02; and MID-93, p = 0.015). No significant deviations were observed in the sample of diabetics. Overall, the number of significant tests (four out of 64, or 6%) is very close to the expected proportion (5%).
Associations with individual admixture
The mean admixture proportions of the population were estimated as 65% European, 34% Native American, and less than 1% African. The sum of intensities parameter was estimated as 7.1 (95% CI 4.9 to 12.4) per 100 cM, implying that the average time back to unadmixed ancestors in this population is at least seven generations. The average Native American ancestry is higher in the cases than in the controls (34.3 v 33.2%, respectively). The estimated distribution of individual admixture proportions for the total sample is shown in fig 1.
Table 4 shows the results of logistic regression analyses with type 2 diabetes as a dependent variable. In a model with age, sex, and individual admixture only, the odds ratio for type 2 diabetes associated with unit change in Native American admixture proportion (from 0 to 1) was estimated as 8.1 (95% CI 1.3 to 59). Adjustment for BMI had little effect on this odds ratio (table 4). When income and education were included as explanatory variables in the model, there was a strong inverse relation of type 2 diabetes risk to income: the odds ratio associated with an increase of one unit in income category was 0.88 (95% CI 0.83 to 0.94). With income and education in the model, the odds ratio associated with Native American admixture was 5.5 (95% CI 0.8 to 41). With fasting insulin or BMI as dependent variable in the regression model, there was no evidence of a relationship of these variables to individual admixture (data not shown).
Linkage disequilibrium between markers within compound loci
We estimated the extent of linkage disequilibrium (LD) between markers within each compound locus. LD was very high between the four markers located in the CAP10 locus (D′ between 85 and 100%) and also between the two markers located within the PPARG locus (D′ = 88%). LD was also close to the maximum possible value between three of the four markers located within the ABCC8/KCNJ11 genes (ABCC8 exon 31 G→A, ABCC8 exon 33 G→T, and KCNJ11-E32K; D′ between 94 and 100%). However, in spite of being located only some kilobases apart on chromosome 11 (approximately 30 kb), LD was very low between the ABCC8 exon 16 splice acceptor site and the other three markers within the ABCC8/KCNJ11 genes (D′ between 9 and 19%). For this reason, in this genomic region we constructed haplotypes on the basis of the three markers showing tight linkage disequilibrium, and the exon 6 polymorphism was analysed independently.
Associations with candidate gene polymorphisms
Table 5 shows the allele frequencies of the candidate gene polymorphisms analysed in the sample of cases and controls and table 6 depicts the results of tests for associations of type 2 diabetes with the SNPs in each compound locus, tested one at a time. Table 7 shows the results of score tests for associations of type 2 diabetes with the haplotypes estimated at each of the four compound loci. At each compound locus, rare haplotypes have been grouped into a single category to construct the test for association, although the statistical model evaluates all possible haplotypes. At each locus the haplotypes are tested for association one a time, and in addition a chi-squared test statistic was calculated to test the null hypothesis that all haplotypes have no effect. Test statistics are not calculated for haplotypes where the observed information is less than 1 because for these rare haplotypes the asymptotic properties of the score test do not hold.
In a model adjusting for age and sex only, associations were significant at the 5% level for one of the two SNPs in the PPARG gene (PPARG-E6), and for three of the four SNPs in the ABCC8/KCNJ11 gene (ABCC8-E31, ABCC8-E33, and KCNJ11-E23K) (table 6). When each haplotype was tested successively, there were no significant associations with any of the four PPARG haplotypes. In the ABCC8/KCNJ11 gene, the A-T-G haplotype was positively associated with type 2 diabetes, and the G-G-A haplotype was negatively associated with type 2 diabetes. The summary test for association with all haplotypes at ABCC8/KCNJ11 was also significant (p = 0.027) (data not shown). None of the SNPs or haplotypes in CAPN10 showed any evidence of association with type 2 diabetes. In a model adjusting for individual admixture proportions in addition to age and sex, the associations with SNPs in PPARG and ABCC8/KCNJ11 were changed only slightly (tables 6 and 7). Additional adjustment for BMI had little effect on these associations (data not shown). For two of the ancestry informative markers—MID52 and D11S429—associations with type 2 diabetes were significant at the 5% level without adjustment for admixture, but not after adjustment for individual admixture.
There were no significant associations of fasting insulin levels with SNPs or haplotypes in CAPN10, PPARG, or ABCC8/KCNJ11 (data not shown). The T allele at locus GNB3 was associated with higher insulin levels in an analysis adjusting for age and sex only (p = 0.02). This association persisted after adjustment for admixture (p = 0.02), but was weakened by adjustment for BMI (score −2.44, observed information 2.78, p = 0.09). None of the candidate gene polymorphisms showed any evidence of association with BMI.
The prevalence of type 2 diabetes is higher in many populations of Native American ancestry than in people of European ancestry living in similar environments. In San Luis Valley, prevalence of type 2 diabetes in the admixed Hispanic American population compared with non-Hispanic American whites has been estimated to be 2.1-fold higher in men and 4.8-fold higher in women.11 If the excess risk of type 2 diabetes in Hispanic Americans compared with Europeans has a genetic basis, we would expect to observe within Hispanic American populations an association of type 2 diabetes with the proportion of the genome that is of Native American ancestry. The estimated size of effect in this study (OR = 8.1 without adjustment for socioeconomic status, OR = 5.1 after adjustment for socioeconomic status) is compatible with a genetic explanation for this ethnic difference, but as the interval estimate for the adjusted odds ratio overlaps 1, we cannot exclude an environmental explanation based on factors associated with low socioeconomic status that confound the association with individual admixture. The 95% CIs for the effect of European/Native American admixture proportions on risk of type 2 diabetes are wide in this dataset: first, because individual admixture proportions vary only over a modest range in this population; and second, because with only 21 markers informative for ancestry we cannot accurately estimate the proportion of the genome that is of Native American ancestry.
Even though we cannot measure individual admixture accurately, we can still control for confounding by individual admixture when testing for associations with alleles or haplotypes after adjusting for individual admixture proportions, because the score test (by integrating over the posterior distribution of all missing data, including individual admixture) allows for uncertainty in estimates of individual admixture.10 For most of the candidate gene polymorphisms that were included in this study, the confounding effect of individual admixture is weak, and thus the effect of the candidate gene polymorphism, adjusted for confounding, can be estimated accurately even though the confounder has not been measured accurately. The proportion of information extracted (ratio of observed to complete information) in the score test can be interpreted as a measure of the efficiency of the analysis, compared with the analysis of a dataset in which individual admixture is estimated accurately (with a large panel of ancestry informative markers) and individual haplotypes are assigned without error. Even though haplotypes are not observed directly in this analysis but inferred from unphased genotype data, the proportion of information extracted in score tests for association with each haplotype is greater than 70% except where the haplotype is rare. As theory shows, when studying haplotype effects on disease risk it is generally more efficient to study a large sample of unrelated individuals and to model the effects of the unobserved haplotypes than to type other family members in order to infer phase.15
In this study we typed polymorphisms in five candidate genes (CAPN10, GNB3, PPARG, ABCC8/KCNJ11) and evaluated their association with type 2 diabetes and fasting insulin. Because we have strong prior evidence of associations with these candidate gene polymorphisms, it is reasonable to interpret even modest significance levels (p<0.05) as evidence of association. The calpain-10 gene is of particular interest, because the association of this gene with type 2 diabetes was first detected in a Hispanic American population.16 Horikawa et al reported that the highest diabetes risk in this population was associated with the haplotype pair 112/121 (here referred to as G2T/G3C) at the three polymorphic sites CAPN10-43, CAPN10-19, and CAPN10-63. Subsequent studies of these SNPs and others in the CAPN10 gene (for example, CAPN10-44) have not consistently replicated this result. While some studies found a modest effect on type 2 diabetes or associated phenotypes,17–20 other reports indicated no association.21–24 Several factors can explain the heterogeneous findings observed in studies involving CAPN10 and type 2 diabetes: variation in statistical power between studies, ethnic differences in allele or haplotype frequencies, and the presence of gene–gene and gene–environment interactions, among others. Two recent meta-analyses of CAPN10 family based and population based studies have reported modest pooled odds ratios for the allele UCSNP-44 C (OR∼1.225) and the UCSNP-43 G/G homozygote (OR∼1.226). This means that very large sample sizes are required to detect the effect of these variants on type 2 diabetes risk. In the present study, we have not observed any significant association of CAPN10 alleles or haplotypes with type 2 diabetes, fasting insulin, or BMI (table 6). Additionally, no significant effect was detected when testing the G2T/G3C haplotype pair (data not shown). Although in our sample of Hispanic Americans the relative frequency of one of the haplotypes hypothesised to increase type 2 diabetes risk (112 or G2T) is higher than in populations of European ancestry (approximately 17 v 3–8%, respectively), we have not observed a significant effect of CAPN10 polymorphisms on type 2 diabetes or fasting insulin levels.
PPARG is a member of the nuclear hormone receptor subfamily of transcription factors, and has an important role in insulin action and fat metabolism.27 A recent meta-analysis28 including more than 3000 individuals of European ancestry indicated that the common Pro12 allele slightly increases type 2 diabetes risk. However, in a Native American population, the Oji-Cree from Western Ontario,29 the Ala12 allele was associated with type 2 diabetes in women. In the San Luis Valley sample association was detected with C161T, a common silent polymorphism,30,31 but not with the Pro12Ala locus. The analysis at the haplotype level did not show any significant effect.
The ABCC8 (SUR1) and KCNJ11 genes encode the two subunits that constitute the ATP sensitive potassium channel of the pancreatic beta cells. This channel is the target of the sulfonylurea class of drugs. Both genes are located 4.5 kb apart on chromosome 11, and have been widely studied in relation to risk of type 2 diabetes. We analysed three common polymorphisms in the ABCC8 gene and one site in the KCNJ11 gene in the San Luis Valley sample. The variants in the ABCC8 gene are a C/T mutation in the splice acceptor site of exon 16, which has been associated with effects on insulin secretion and type 2 diabetes,32–35 a silent G/A mutation in exon 31 that has been associated with hyperinsulinaemia and type 2 diabetes,36,37 and a non-synonymous G/T substitution in exon 33.38 The G/A mutation in codon 23 (E23K) of the KCNJ11 gene has also been associated with type 2 diabetes in numerous studies, including two recent meta-analysis in European populations.39–42 Three of these SNPs were associated with type 2 diabetes in the San Luis Valley sample (ABCC8 exon 31 allele A, ABCC8 exon 33 allele T, and KCNJ11 E23K allele G; see table 6). In fact, these three markers are in strong linkage disequilibrium (LD), with D′ values between 94 and 100%, while the LD between these sites and the marker located upstream in exon 16 is very low (D′ between 9 and 19%). Interestingly, the ABCC8 exon 31-A allele has been previously reported to be associated with hyperinsulinaemia in non-diabetic Mexican Americans.36 Ancestry specific haplotype frequencies were estimated using ADMIXMAP, which combines the information from unadmixed “parental” samples and the SLV sample (table 8). The three common haplotypes in the compound ABCC8/KCNJ11 locus accounted for an estimated 98% of haplotypes in the San Luis Valley population. Haplotypes bearing A-T-G at the three loci listed above (Ex31-Ex33-E23K) were positively associated with type 2 diabetes, and haplotypes bearing G-G-A at the three loci listed above were inversely associated with type 2 diabetes. The haplotype increasing type 2 diabetes risk (A-T-G) is more frequent in Native American populations than in European populations (50 v 28%). One problem with testing for haplotype effects is that without a strong prior hypothesis about which haplotypes are associated with disease risk, detection of effects relies on summary tests over all haplotypes, which have multiple degrees of freedom and thus low statistical power. It is important to note that the KCNJ11 E23K allele associated with type 2 diabetes in the San Luis Valley population (allele G) is not the same allele that has been reported to be associated with type 2 diabetes in populations of European ancestry (allele A).39–42 In fact, the odds ratio associated with the presence of the E23K G allele in San Luis Valley (OR∼1.40, CI: 1.05 to 1.85), does not overlap with the values reported in Europeans, where a recent meta-analysis of case control data indicated a pooled odds ratio associated with the presence of the E23K A allele of 1.23 (CI: 1.12 to 1.3640). Because the E23K A allele occurs predominantly on the same haplotype (G-G-A) in both Europeans and Native Americans, opposite directions of association with diabetes mellitus in these two populations cannot be explained by occurrence on different haplotypes. These contradictory results could be due in part to differences with respect to the ethnic background in the samples, but additional studies in Native American and Hispanic populations are needed to confirm this point.
The C825T polymorphism of the gene encoding the G protein beta-3 subunit (GNB3) has been repeatedly associated with obesity, hypertension, and type 2 diabetes.43–46 This gene plays a key role in intracellular signalling and the 825T mutation creates a splice variant that is functional and is associated with enhanced G protein activation.46 We observed no association with type 2 diabetes, but in non-diabetic individuals the T allele was associated with higher fasting insulin concentrations.
We have thus detected in this Hispanic American population an association of type 2 diabetes with markers located on the ABCC8 and KCNJ11 genes, closely linked on chromosome 11. We have also described evidence of association of the G825T polymorphism with fasting insulin in the non-diabetic sample. We did not observe an association of four CAPN10 markers with type 2 diabetes. In the score tests used in this study, the observed information for the log odds ratio associated with a common allele or haplotype is typically about 30, equivalent to a standard error of 0.18. We can thus estimate that our study had adequate (90%) statistical power to detect at 5% significance a log odds ratio for type 2 diabetes of about 0.6 (OR∼1.8) associated with one extra copy of any common allele or haplotype. Therefore, our study would have detected effects of the magnitude described in the Mexican American sample in which the original association of CAPN10 and type 2 diabetes was reported (OR>2). However, as pointed by Song et al,26 neither this or, for that matter, most of the previous CAPN10 studies can individually achieve enough statistical power to detect the modest effects that have been described in recent meta-analyses of markers such as the UCSNP-44 C allele (OR∼1.1925), the UCSNP-43 G/G genotype (OR∼1.1926), or the PPARG Pro12 allele (OR∼1.2528).
We have also demonstrated the ability to control for confounding by population stratification when studying genetic associations within a recently admixed population, using a panel of markers informative for ancestry and bayesian computationally intensive methods for statistical analysis. This makes it possible to study genetic associations in stratified populations using ordinary case control and cross sectional designs, rather than family based designs which require parents or sibs of affected individuals to be collected.10 We note also that the San Luis Valley population, with about 34% average Native American admixture, is an ideal setting in which to apply a novel approach that exploits admixture to localise genes underlying ethnic differences in risk of type 2 diabetes. However this will require a much larger panel of markers informative for Native American versus European ancestry.
We thank the San Luis Valley Study participants for their help.
This work was supported in part by grants from NIH/NIDDK (DK53958) and NIH/NHGRI (HG02154) to MDS. The development of the ADMIXMAP program was supported by NIH grant MH60343 to PMM.
Conflict of interest: none declared.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.