Background The Mexican population and others with Amerindian heritage exhibit a substantial predisposition to dyslipidemias and coronary heart disease. Yet, these populations remain underinvestigated by genomic studies, and to date, no genome-wide association (GWA) studies have been reported for lipids in these rapidly expanding populations.
Methods and findings We performed a two-stage GWA study for hypertriglyceridemia and low high-density lipoprotein cholesterol (HDL-C) in Mexicans (n=4361), and identified a novel Mexican-specific genome-wide significant locus for serum triglycerides (TGs) near the Niemann–Pick type C1 protein gene (p=2.43×10−08). Furthermore, three European loci for TGs (APOA5, GCKR and LPL), and four loci for HDL-C (ABCA1, CETP, LIPC and LOC55908) reached genome-wide significance in Mexicans. We used cross-ethnic mapping to narrow three European TG GWA loci, APOA5, MLXIPL, and CILP2 that were wide and contained multiple candidate variants in the European scan. At the APOA5 locus, this reduced the most likely susceptibility variants to one, rs964184. Importantly, our functional analysis demonstrated a direct link between rs964184 and postprandial serum apoAV protein levels, supporting rs964184 as the causative variant underlying the European and Mexican GWA signal. Overall, 52 of the 100 reported associations from European lipid GWA meta-analysis generalised to Mexicans. However, in 82 of the 100 European GWA loci, a different variant other than the European lead/best-proxy variant had the strongest regional evidence of association in Mexicans.
Conclusions This first Mexican GWA study of lipids identified a novel GWA locus for high TG levels; used the interpopulation heterogeneity to significantly restrict three previously known European GWA signals, and surveyed whether the European lipid GWA SNPs extend to the Mexican population.
- Lipid disorders
- Complex traits
- Genetic epidemiology
Statistics from Altmetric.com
Mexicans are an admixed population of European, Native American and a small percentage of African (1–3%) ancestries.1 Mexican national surveys have consistently confirmed that Mexicans have an alarmingly high prevalence of multiple types of dyslipidemias.2 ,3 Based on the most recent survey,2 31.5% of Mexicans have hypertriglyceridemia (HTG) (defined as serum triglycerides (TGs) >150 mg/dl), 43.6% hypercholesterolemia (total cholesterol (TC) >200 mg/dl), and 60.5% low high-density lipoprotein cholesterol level (HDL-C) (HDL-C<40 mg/dl), respectively. Elevated levels of serum TGs and low serum HDL-C are well established risk factors of coronary heart disease (CHD (MIM 607339)) independent of other lipoproteins.4 A previous study based on 72 200 subjects also underlines the significant lifelong CHD risk of TG-increasing variants.5 ,6
A meta-analysis of genome-wide association (GWA) studies comprising 100 000 individuals of European origin identified 95 loci for serum lipids, of which 24 were implicated for TGs and 38 for HDL-C.7 Taken together, the identified variants explain 10–12% of TG and HDL-C variances.7 These results cannot, however, be directly extended to other populations due to interpopulation differences in genetic architecture. Amerindian populations have been under-represented in GWA studies, although they have a high susceptibility for metabolic diseases and represent the ethnic groups with a rapid population growth. Therefore, it is critical to investigate diverse populations, such as Mexicans and other groups with Amerindian heritage, in order to determine which variants and genes are shared across populations. Furthermore, GWA studies in diverse populations may reveal novel genes and variants that are either population-specific or due to differences in allele frequencies, patterns of linkage disequilibrium (LD), disease prevalence and gene–environment interactions.8
The GWA meta-analysis for serum lipids in European origin subjects7 demonstrated well one of the major challenges in GWA studies: the identified GWA loci tend to remain wide due to extended LD, making it very difficult to pinpoint the actual underlying variant(s) or even gene(s). Thus, conversion of the European original GWA signals to functional knowledge about the underlying mechanisms has been a slow process, further hampered by the small effect sizes of the common GWA variants and the fact that these variants are often intronic or intergenic. We recently demonstrated that taking advantage of the interpopulation heterogeneity using Mexicans in cross-ethnicity mapping can help narrow the GWA loci and assist in dissecting the functional susceptibility variants underlying the European original GWA signals.9
Importantly, despite the high prevalence of unfavourable lipid levels in Mexicans, there has been no genomic study for lipids in this population thus far. Here, we report the first Mexican GWA study for lipids in Mexican HTG cases and controls.
We performed a two-stage GWA study to identify variants for HTG in Mexicans. Positive signals of stage 1 were genotyped in stage 2, and a combined analysis of the two stages was performed to identify genome-wide significant variants. Clinical characteristics of the stages 1 and 2 study samples are shown in online supplementary table S1. After quality-control procedures, 2240 samples, 563 599 genotyped single-nucleotide polymorphisms (SNPs), and 769 042 SNPs imputed based on the HapMapIII Mexican–American (MEX) sample were available for stage 1 analysis. This case-control study sample was ascertained based on the serum TG levels (see Methods section). Thus, we performed association analysis for the binary TG-affection status using logistic regression and adjusted for population admixture using ancestry estimates from principal component analysis (PCA) as a covariate (see the online supplementary methods).
For stage 2, we selected all non-redundant (LD r2≤0.3) genotyped SNPs that provided the strongest evidence of association (p≤2.5×10−3) in stage 1 as well as imputed SNPs with p≤8.0×10−04 (see online supplementary table S2 and figure S1 A,B). The same SNPs were selected after adjustment for gender (see online supplementary figure S1C), and they also remained significant with correction for body mass index (BMI) (p≤0.01). Conditional association analyses at the top 12 genotyped loci did not reveal additional independent SNPs with p≤2.5×10−3. To validate our selection strategy and assure that all significant loci are captured by the two-stage GWA design, we compared the phenotypic variance explained on the disease liability scale by all stage 1 SNPs to the variance explained by the SNPs selected for stage 2 by breaking down the genetic relationship matrix (GRM) to the selected SNPs and the rest of the genome (see the online supplementary methods).10 The proportion of variation in the disease liability captured by all the GWA SNPs is 64% (±16), whereas the selected SNPs explain 63% (±3) of the variation, and the remaining GRM component explains practically none of the variation in the disease liability (3×10−06 ± 0.12), indicating that virtually all contributing loci were selected for stage 2.
We examined whether adjusting for locus-specific ancestry proportions in the regression analysis in addition to the global ancestry correction would reveal SNPs that were missed by the traditional GWA analysis (see online supplementary figure S2). We identified only 40 additional SNPs that became significant at the stage 2 selection threshold (p value=1.2×10−04−2.5×10−03) when including local ancestry estimates. However, these SNPs were not selected for follow-up in stage 2, as dense marker sets are needed for local ancestry assessment. Overall, in stage 2, we successfully tested for association 2121 additional subjects and 1235 SNPs using logistic regression with individual ancestry estimates as a covariate. A joint analysis of the stages 1 and 2 data (n=4361 subjects) was performed using a fixed-effects meta-analysis11 (figure 1 and see online supplementary table S3).
A novel locus for elevated TG levels identified in Mexicans
The combined analysis of stages 1 and 2 revealed a novel TG locus, rs9949617 at chr18q11, that reached the genome-wide significance (OR=0.78 (CI95 0.69 to 0.86), p=2.43×10−08). Importantly, a SNP in LD with rs9949617 (rs4800467, r2=0.9), also approached the genome-wide significance level, thus supporting this locus (figure 2A), and both SNPs (rs9949617 and rs4800467) were directly genotyped. The minor allele frequency (MAF) of rs9949617 is substantially lower in the HapMapIII CEU (Utah residents with ancestry from northern and western Europe) sample (19%) than in the Mexican controls (40%). Interethnic differences in allele frequencies can lead to power differences and apparent population-specific findings. Furthermore, we also found a significant difference in LD between the CEU and MEX populations of HapMapIII at this locus (PvarLD=2.6×10−06) (see the online supplementary methods).12 The observed variation in LD is at the top 1 percentile of the genome-wide distribution based on the comparisons of all autosomal chromosomes (see online supplementary figure S3).
To investigate whether any of the regional genes (±500 kb) (figure 2A) influence serum lipids, we tested them for differential expression in 70 adipose tissue samples from Mexican hyperlipidemic cases and controls. We observed a significant association after correcting for multiple tested genes (n=5) between the expression of the Niemann–Pick type C1 (NPC1) gene and hyperlipidemia case-control status (p unadjusted=6.36×10−03, see online supplementary table S4). Furthermore, the expression of NPC1 was also correlated with serum TGs in adipose tissue from the Hybrid Mouse Diversity Panel (HMDP)13 (p=2.03×10−04). No other regional gene probed by the arrays was significantly differentially expressed at multiple correction level.
We further genotyped three novel signals (rs2215964, rs805743 and rs4951964) that did not reach the genome-wide significance threshold in the combined analysis of stages 1 and 2, but were strongly associated with TGs (p≤5×10−05) (see online supplementary table S3) in 1712 additional Mexican subjects. The combined p value of rs805743 improved slightly (p=6.6×10−06), but none of the three SNPs surpassed the genome-wide significance level with the increased total sample size of 6073 Mexican samples.
Cross-ethnic mapping of European GWA loci for TGs
The combined stage 2 analysis revealed genome-wide significant signals for HTG at the previously reported European GWA loci,7 APOA5, GCKR and LPL, as well as smaller effects (p<5×10−05) at the ANGPTL3, TIMD4-HAVCR1, MLXIPL and CILP2 regions. The lead SNPs of these loci were all directly genotyped (table 1). To narrow these European GWA loci using cross-ethnic mapping, we first assessed whether the Mexican associations were in LD with the lead European TG signals. We considered a locus as independent if the LD between the lead Mexican and European SNPs is low (r2≤0.3) in CEU, and no association was reported in Europeans7 at p<1×10−06 for the Mexican-lead SNP, and as shared if the LD is high in CEU r2 ≥0.5.
We observed an independent Mexican signal at the TIMD4-HAVCR1 locus. The predominant Mexican SNP rs2036402 (OR=1.23 (CI951.14 to 1.32), p=3.41×10−06) was not associated with TGs (p=0.07) in the Europeans, or in LD with the European-lead SNP rs1363232 (table 1, see online supplementary figure S4A). Accordingly, rs2036402 remained significant when including the genotype counts of rs1363232 as a covariate in a conditional analysis (p=3.79×10−05). Differences in allele frequencies (table 1), and LD patterns (PvarLD=9.2×10−07) may account for this apparent independent signal. Importantly, rs2036402 is in strong LD with a missense variant rs12522248 in HAVCR1 (r2=0.9), suggesting HAVCR1 as the underlying gene at this uncharacterised GWA locus.
The associations at the APOA5, GCKR, LPL, ANGPTL3, MLXIPL and CILP2 loci were shared between Mexicans and Europeans (table 1). However, the functional gene or variant(s) have not been established yet, except for GCKR and LPL.14 ,15 Therefore, we investigated the remaining four loci for variation in LD patterns across Mexicans and Europeans to narrow down the candidate regions. Using the dense 1000 Genomes data,16 and genotyping imputation in the stage 1 Mexican sample (see the online supplementary methods), we evaluated the number of SNPs highly correlated (r2≥0.5) with both the European index SNP in the CEU sample and the Mexican index SNP in the Mexican controls versus the total number of SNPs correlated with the European index SNP in CEU (figure 3). We found three loci (APOA5, MLXIPL and CILP2) in which the cross-ethnic LD comparisons suggest a smaller region and a reduced number of SNPs underlying the shared signal (figure 3, and see online supplementary figure S4).
The SNP rs964184, at the APOA5 region, obtained the strongest evidence for HTG in Mexicans (OR=1.8 (CI951.68 to 1.86), p=5.5×10−35). The rs964184 was also the lead SNP for TGs in the European GWA meta-analysis,7 and showed genome-wide significant associations for CHD and other related traits in European GWA studies.17 ,18 We found that in Europeans, 26 other SNPs are in high LD (r2≥0.5) with rs964184, whereas in Mexicans, no other SNP is in high LD with rs964184, suggesting that this variant is the more plausible susceptibility variant (figure 3). The non-synonymous APOA5 SNP rs3135506 (S19W) that was suggested as the putative functional variant19 had substantially lower evidence than rs964184 in the imputed stage 1 sample (p=3.24×10−07 vs p=6.31×10−19, respectively, figure 2B). Furthermore, including the genotype counts of rs964184 as a covariate in a multivariate analysis completely abolished the association with rs3135506 (p=0.7), as well as with other SNPs, suggesting that in Mexicans, the SNP rs964184 accounts for the strong signals near APOA5.
To functionally investigate the effect of rs964184 on serum apoAV protein levels, we performed an oral fat-tolerance test in 41 Mexicans with serum apoAV protein levels measured at fasting, and at 3, 4, 6 and 8 h postprandially (see the online supplementary methods). We observed differences in serum apoAV protein levels between the rs964184 genotype groups (C/C=14, C/G=16 and G/G=11) at 3–6 h postprandially, with small to no difference at fasting or 8 h. During the response time (3–6 h), apoAV levels in the group with the rs964184 TG-increasing allele (G) were lower than in the common allele carriers (C), in agreement with the known inverse relationship between apoAV and TG levels.20 We quantified this response by calculating the area under the incremental curve (AUIC) and observed a significant association between the apoAV AUIC levels and rs964184 genotypes using an additive genetic model (p=0.009, and non-parametric p=0.02) (figure 4). These results demonstrate for the first time a direct relationship between the well established TG and CHD susceptibility variant, rs964184, and the APOA5 gene, as well as suggest plausible mechanism of action for this TG and CHD GWA variant.
Association analysis with HDL-C levels
As a secondary analysis, we tested the stage 1 genotyped SNPs for association with HDL-C levels using linear regression with adjustments for gender and the TG case-control status to avoid sampling bias. For stage 2, we selected all non-redundant (r2≤0.3) SNPs that provided the strongest evidence of association (p≤5×10−4) for HDL-C (see online supplementary table S2 and figure S1D). Furthermore, these SNPs remained significant (p≤0.01) using association analysis with secondary phenotypes regression models (SPREG), which further accounts for the non-random sampling of the cases and controls by incorporating the known disease prevalence (see Methods section). The correlation between the test statistics from SPREG and linear regression was r=0.99, suggesting that the SNP selection for stage 2 is very well controlled for the HTG affection status. We performed a joint analysis of both stages using a fixed-effects meta-analysis (figure 1 and see online supplementary table S5). Four previously known HDL-C loci from European GWA study, ABCA1, CETP, hepatic lipase gene (LIPC) and LOC55908, surpassed genome-wide significance in Mexicans. All these GWA signals were directly genotyped. We assessed whether these associations were shared with the European signals (table 1). At ABCA1, we identified independent associations (table 1) at two non-redundant (r2=0.06) SNPs, rs4149310 and rs928254 that were associated with HDL-C in Mexicans but not in Europeans.7 Accordingly, these SNPs remain genome-wide significant in a conditional analysis, including the genotype counts of rs2575876 a proxy (r2=0.9) of the lead European SNP as a covariate (p<2×10−10). The associations at the loci CETP, LIPC and LOC55908 were shared between Mexicans and Europeans (table1 and figure 3). Among these loci, the underlying gene(s) at the LOC55908-DOCK6 region have not been established yet. In Mexicans, SNP rs2278426 reached genome-wide significance (p=3.44×10−09, β=−0.14±0.02, see online supplementary figure S4D). This SNP is a missense variant (R59W) in LOC55908, a hepatocellular carcinoma-associated protein.
Generalisation of European lipid risk variants to the Mexican population
Thus far, in GWA studies of European descent cohorts, 102 SNPs have been implicated for lipids (TG, HDL-C, TC and LDL-C).7 However, interpopulation heterogeneity could limit the applicability of these SNPs for therapies and risk models across populations. We therefore performed a meta-analysis of the stages 1 and 2 samples for the reported European GWA SNPs (n=41) or their proxies (n=59). The average r2 of the proxies with the lead European SNP in CEU was 0.96 (see online supplementary table S6). Two European GWA SNPs rs13238203 near TYW1B and rs4420638 near apolipoprotein E (APOE) could not be genotyped or tagged as these SNPs failed the primer design, and are not in LD (r2<0.3) with any other SNP in the HapMapIII CEU dataset. We examined whether the SNPs are associated in Mexicans in the same direction and lipid trait as in Europeans (one-sided significance threshold of 0.05) using the dichotomous TG status and continuous HDL-C and TC levels while adjusting for the TG affection status. We found evidence of generalisation to Mexicans at 52 of the SNPs (table 2). Online supplementary table S6 provides the detailed results for each SNP by the lipid trait. Overall, of the 100 tested SNPs, 83 had an effect in the same direction as reported in the Europeans7 (binomial p=1.31×10−11). The number of observed significant associations is not significantly different from the expected number (51) based on the cumulative power of all SNPs (table 2), suggesting that overall the European risk variants generalise to the Mexicans. However, several loci (Lipoprotein(a) (LPA), LRP4 and PCSK9) did not generalise although we had sufficient power to detect associations of the effect sizes observed in Europeans (≥80%) (see online supplementary table S6).
Besides power, interpopulation heterogeneity is another explanation for non-generalisation. In non-European populations, other SNPs may represent better proxies or population-specific variants. Accordingly, we observed that in 82 of the 100 loci, a SNP other than the European lead/best-proxy SNP (r2<0.3) obtained the strongest evidence of association in the region (±500 kb) in the Mexican stage 1 GWA (see online supplementary table S6), indicating that risk assessment models should be based on high-resolution data from each population in order to capture the distinct proxies and causal variants.
We performed a GWA study to search for variants conferring the high susceptibility of Mexicans to HTG and low HDL-C. We identified a novel Mexican-specific locus for high TGs, rs9949617 near the NCP1 gene that surpassed the genome-wide significance level. In addition, three European TG GWA loci (APOA5, GCKR and LPL) and four European HDL-C loci (ABCA1, CETP, LIPC and LOC55908) reached genome-wide significance in Mexicans. Furthermore, we found evidence of generalisation to the Mexican population for ∼50% of the European GWA variants for TGs and HDL-C. By using cross-ethnic LD comparisons we were able to refine three loci, APOA5, CILP2 and MLXIPL.7 Most striking is the APOA5 locus that was restricted from 26 SNPs to the lead SNP rs964184, for which we demonstrate specific association with postprandial serum apoAV protein levels. The LD analysis also suggests independent Mexican variants in two European GWA loci (TIMD4 and ABCA1).
The novel TG locus near NCP1 was not significantly associated with TGs in Europeans.7 Possible explanations for the population specificity are the low MAF of rs9949617 in Europeans compared with Mexicans (19% vs 40%), thereby reducing the power for detecting the signal in Europeans and the difference in regional LD, which is at the top one percentile of the genome-wide distribution based on the comparisons of all autosomal chromosomes. Expression of NPC1 was significantly correlated with TG levels both in human and mouse adipose tissue. In the mouse adipose tissue, expression levels were also significantly correlated with HDL-C, unesterified cholesterol, and non-HDL-C (data not shown). NPC1 is a transmembrane protein containing a sterol-sensing domain that participates in cholesterol trafficking from the late endosome/lysosome to the plasma membrane.21 Rare mutations in NPC1 cause an autosomal recessive disorder, NPC disease (MIM 257220), characterised by accumulation of unesterified cholesterol and other lipids in late endosomes and lysosomes.21 NPC patients often have low levels of HDL-C, LDL-C and TC, while their TG levels tend to be increased.22 A non-synonymous variant in NPC1 was associated with early onset and morbid adult obesity in a GWA study of European descent.23 The association with plasma TG levels is beyond the NPC1-obesity association, since the signal of association persisted after the confounding effect of BMI was controlled in our statistical analyses. NPC1 gene is regulated through the sterol regulatory element-binding protein (SREBP) pathway,22 suggesting that the NPC1 protein plays a central role in maintaining lipid homeostasis. Evidence derived from animal models shows that NPC1 haploinsufficiency results in weight gain, adipocyte hypertrophy, hepatic steatosis, impaired fasting glucose, glucose intolerance, hyperinsulinemia, hyperleptinemia and HTG.24 Since NPC1-deficient cells fail to deliver LDL-derived free cholesterol to mitochondria and endoplasmic reticulum (ER), they have impaired synthesis of endogenous liver X receptor (LXR) ligands, and LXR target genes are downregulated. These changes may stimulate de novo lipogenesis and result in TG synthesis. Alternatively, decreased intracellular-free cholesterol concentrations in NPC1-deficient cells could lead to SREBP activation and increased de novo lipogenesis.25 However, extensive targeted resequencing is warranted to identify the full spectrum of causative variants in the NPC1 region.
ApoAV is emerging as a potent modulator of serum TG levels.26 The SNP rs964184 located near the APOA1/C3/A4/A5 gene cluster, 11 kb upstream of the APOA5 gene, has been associated with both TGs and CHD in large European GWA meta-analyses.7 ,17 In Mexicans, it was the lead signal for TGs with a significantly higher MAF, and thus, attributable risk than in Europeans (30% vs 12%). However, no direct link between rs964184 and APOA5 has been established as of yet. Furthermore, rs964184 failed to regulate the gene expression levels of the APOA1/C3/A4/A5 cluster genes or other nearby genes (ie, cis-expression quantitative trait loci) in large human tissue samples from liver and fat.7 However, several studies have demonstrated postprandial TG levels as a risk factor for CHD.27 ,28 Our data show that postprandially, the TG-increasing allele (G) group had a significantly lower serum apoAV response than the common allele carriers (C), demonstrating for the first time a direct link between rs964184 and serum apoAV protein levels. Given these functional data and the fact that rs964184 is not in LD with other regional variants in Mexicans, we speculate a regulatory mechanism of action for this variant responsive to the postprandial state. Several nuclear receptors involved in energy metabolism are known to regulate APOA5 expression (eg, FXR and PPARa),26 and recently an orphan nuclear receptor Nur77 has also been shown to bind and regulate the human APOA5 promoter region,29 suggesting that there might be other regulatory factors and elements of APOA5 that have not been identified as of yet. Furthermore, the molecular mechanisms by which apoAV regulates plasma TG in vivo is still debated. It has been suggested that apoAV (1) enhances the catabolism of TG-rich lipoproteins by LPL or (2) it inhibits the rate of production of VLDL.20 Our data supports the stimulation of LPL-mediated removal of TG-rich lipoproteins mechanism. Taken together, the regional cross-ethnic LD comparison and functional data pinpoint rs964184 as the causative variant underlying the GWA signal as well as suggest postprandial transcriptional regulation as a plausible molecular mechanism for this TG and CHD-GWA variant. This potential mechanism of action warrants investigation in future functional studies.
The unique LD architecture of Mexicans enabled us to significantly reduce the associated region of three European GWA loci for TGs: APOA5, CILP2 and MLXIPL.7 In the CILP2 region that has also shown genome-wide significant associations for CHD,7 the strongest signal in Mexicans was obtained with rs2228603, a missense variant in the NCAN gene. Importantly, in a recent European GWA for fatty liver disease, this SNP was genome-wide significant for hepatic steatosis with the same T risk allele as in Mexicans.30 Fatty liver disease is characterised by increased accumulation of fat, especially TGs, in the liver cells, and is associated with increased TGs.31 Thus, this independent finding further supports the non-synonymous variant rs2228603 in NCAN (P91S) as the functional variant underlying the wide GWA signal.
Similarly, at the LOC55908-DOCK6 locus for HDL-C, we observed a genome-wide significant signal with rs2278426, whereas the lead European SNP rs737337 was slightly less significant in Mexicans. The MAFs of rs2278426 and rs737337 are high in Mexicans (30% and 32%), and low in Europeans (4% and 7%), making it more difficult to reliably impute and compare their evidence of association in Europeans. As rs2278426 is a missense variant (R59W), predicted to be damaging32 in LOC55908, the association data in Mexicans also revealed the putative functional variant and gene underlying this uncharacterised GWA signal. However, it should be noted that only a limited number of loci associated with HDL-C have also been shown to be associated with CHD,33 ,34 challenging the concept that raising of plasma HDL-C would uniformly translate into reductions in risk of myocardial infarction. However, as HDLs are a heterogeneous group of particles, it is likely that some HDL particles and subspecies may predict the underlying genotype-phenotype correlation and CHD risk better than the conventional serum HDL-C levels.
We observed generalisation from Europeans to Mexicans at 52 of the 100 tested SNPs. Taking into account the power, most SNPs generalised to Mexicans. However, it should be noted that the analysed lipid traits were somewhat different between the studies (eg, dichotomous vs continuous) which could potentially influence the generalisation results. Furthermore, since the European lead variants typically represent tag-SNPs rather than the actual functional variants, another possible explanation for non-generalisation is differences in the surrounding LD. Accordingly, we detected more regional differences in LD patterns between the CEU and MEX populations of HapMapIII at the non-generalised than at the generalised loci (52% vs 32%). Furthermore, we detected considerable allelic heterogeneity, because at 82 of the 100 loci, we identified a different, more strongly associated SNP than the lead European SNP. These data suggest that high-resolution genomic data from each population is needed for accurate cardiovascular risk assessment.
We observed population-specific variants within the known risk loci TIMD4 and ABCA1. The functionally important missense variant rs9282541 in ABCA1 (R230C) indeed seems to be exclusive to Amerindian-derived populations such as Mexicans.35 However, the frequencies of the independent SNPs rs2036402 near TIMD4 and rs4149310 near ABCA1 are substantially lower in the HapMapIII CEU panel than in Mexicans. These interpopulation differences in allele frequency can lead to association signals at different SNPs in different populations which do not necessarily mean that the causal variant is also population specific. Furthermore, as the current GWA platforms were designed based on European populations, they are not ideal for comprehensive assessment of population-specific variants. Thus, it is crucial to further investigate these associations, as more GWA studies, custom-made arrays and sequencing data become available in Mexicans.
Thus far, most GWA studies for lipids have been performed in European-origin cohorts. The current study is the first Mexican GWA study for lipids. The same is true for other populations with Amerindian heritage. When compared with the European GWA consortium, the study sample of this GWA is small. Therefore, we employed a two-stage GWA design which was shown to reduce the cost of genotyping while maintaining the overall power of the study.36 We further demonstrate that the phenotypic variance explained by all the stage 1 SNPs versus the SNPs selected for stage 2 is virtually the same. It is also worth noting that unlike the study samples of the European GWA consortia7 that are predominantly from unascertained population-based cohorts or from case-control studies ascertained for a non-lipid trait (eg, type 2 diabetes), the Mexican GWA study sample was specifically ascertained for HTG. Such a disorder-oriented case-control design has been shown to have superior power over non-ascertained study samples in genetic association studies.37
To conclude, this Mexican GWA study has several novel findings. First, it identified a novel Mexican-specific locus for high TGs near the NPC1 gene. Second, using cross-ethnic mapping, we refined three GWA regions (APOA5, MLXIPL and CILP2) that in the European scan contained multiple candidates due to extended LD. In the APOA5 region, the LD restriction resulted in a single variant for which we demonstrated a direct effect on postprandial apoAV protein levels. Third, we observed that although 52% of the European lipid variants generalised to Mexicans, in 82% of the European loci, a variant differing from the European lead signal had the strongest regional evidence of association in Mexicans. These data demonstrate the importance of using genetic heterogeneity in GWA studies.
Materials and methods
Additional methods in the online supplementary material describe (1) SNP imputation; (2) estimation of the variance explained; (3) LD analysis; (4) gene-expression analysis; (5) oral fat tolerance test and (6) ancestry analyses.
A total of 6073 participants were included in the study. All participants were recruited at the Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán (INCMNSZ), Mexico City. The study design was approved by the ethics committees of the INCMNSZ and University of California, Los Angeles, and all subjects provided a written informed consent. A total of 4400 Mexican hypertriglyceridemic cases and normotriglyceridemic controls were included in stages 1 and 2 (see online supplementary table S1 for clinical characteristics). The inclusion criteria were fasting serum TGs > 2.3 mmol/l (200 mg/dl) for the cases and<1.7 mmol/l (150 mg/dl) for the controls.38 Exclusion criteria were type 2 diabetes or morbid obesity (BMI>40 kg/m2), TGs>6.8 mmol/l (600 mg/dl) for the cases, and the use of lipid-lowering drugs for the controls. Measurements of fasting lipid levels were performed with commercially available standardised methods.39 We also investigated 1712 additional Mexican subjects (1056 HTG cases and controls and 656 family members from 77 Mexican dyslipidemic families40) for the follow-up of the novel SNPs that did not pass the genome-wide significance level but provided p values ≤5×10−05 in the combined analysis of stages 1 and 2.
Genotyping and quality controls
The stage 1 genotyping of 592 394 bi-allelic SNPs was performed using the Illumina Human 610 BeadChip (Illumina) at the Southern California Genotyping Consortium. The following quality control (QC) inclusions were applied using the PLINK41 software: subject and SNP genotyping success rate ≥95%; minor allele frequency ≥1%; departure from Hardy–Weinberg equilibrium (HWE) at p value≥1×10−6; subject heterozygosity rate <4 SDs; identity-by-descent (IBD) proportion <0.25 (excluding duplicates, first-degree and second-degree relatives); and heterozygosity rates of the X-chromosome to verify the reported gender. Gender inconsistencies were clarified and corrected, and we excluded samples with ambiguous sex (inbreeding coefficient between 0.3 and 0.8). After the QC steps, 563 599 SNPs and 2240 Mexican individuals were available for the analysis of stage 1.
For stage 2 genotyping, we selected 1326 SNPs as described in online supplementary table S2. Genotyping of the 1326 stage 2 SNPs was performed using the Illumina Goldengate custom panel at the Southern California Genotyping Consortium. IBD analysis was performed to identify duplicated samples (proportion IBD=1). SNPs and samples with low genotyping call rate (<95%), and SNPs with HWE p value<1×10−6 were excluded, resulting in 1235 SNPs and 2121 samples available for analysis. Genotyping of the follow-up SNPs rs2215964, rs805743 and rs4951964 was performed using the TaqMan genotyping platform (Applied Biosystems). These SNPs had ≥95% genotype call rate and were in HWE (p>0.05) in the normotriglyceridemic controls, as well as in the unrelated family members.
We used a two-stage GWA design. All GWA SNPs passing QC are tested for association in stage 1. Positive signals with p≤2.5×10−3 for HTG, and p≤5.0×10−4 for HDL-C were genotyped in the stage 2 samples, and a combined analysis of the two stages was performed to identify genome-wide significant variants.36 These significance thresholds were used, as higher p values are unlikely to reach the genome-wide significance level in the combined stage 1 and 2 analyses given the sample size. Furthermore, a more stringent selection criterion (p≤8.0×10−04) was applied in the association analysis of HTG with imputed SNPs because of the imputation uncertainty. We employed multivariate logistic regression using an additive genetic model to analyse the HTG affection status, as the study case-control samples were specifically ascertained based on the TG levels, and multivariate linear regression with sex as a covariate to assess the effects of the SNPs on continuous log transformed HDL-C levels. Continuous TC levels were also analysed to examine the association of 48 European-GWA SNPs for TC and LDL-C levels.7 Subjects with trait levels more than four SDs from the mean, or on lipid-lowering therapy at the time of the blood drawing, were excluded from the quantitative analyses. TC and HDL-C levels were also adjusted for the HTG affection status in order to avoid possible sampling bias. Association analyses of the genotyped SNPs were performed using the PLINK41 software, and the imputed SNPs were analysed using the expectation-maximisation option in SNPTEST V.2.1.142 software in order to incorporate the imputation uncertainties in the regression models described above. Association analyses of the secondary phenotype HDL-C were also performed using SPREG,43 which further accounts for the non-random sampling of the cases and controls by incorporating the known disease prevalence. A HTG prevalence of 25% was used in the SPREG regression models, because in all stage 1 cases, serum TG levels were above the 75th age-sex specific Mexican percentile.44 To search for additional independent SNPs (r2<0.3) in the stage 1 analysis, we performed association analyses for the top 12 genotyped loci (p<2×10−05) in stage 1 by including the minor allele counts (0–2) of the lead SNP as a covariate in the regression analyses of the surrounding SNPs (±500 kb) (ie, conditional analysis). Conditional analyses were also performed in the APOA5, TIMD4 and ABCA1 gene regions by including the allele counts of the SNPs rs964184, rs1363232 and rs2575876, respectively. To account for population admixture, the first principal component was included in the stage 1 analyses. Individual ancestry proportions estimated by the STRUCTURE45 method with 67 evenly distributed ancestry informative markers were used in stage 2, as high-density whole-genome data are necessary for PCA. The ancestry analyses by STRUCTURE and PCA46 were compatible and are described in detail in the online supplementary methods. Genomic control was applied to the stage 1 association statistics. The genomic inflation factors and the Quantile-Quantile (QQ) plots are presented in online supplementary table S7 and online supplementary figure S5, respectively. Association results of stages 1 and 2 were combined as implemented in METAL11 using a fixed-effects meta-analysis approach where the weights are proportional to the square root of the number of individuals examined, and the estimates of the β effect were combined by METAL using inverse variance weights.
In the additional follow-up family and case-control study samples, the ‘egscore’ function from the R library GenABEL47 was used to test for association between the three follow-up SNPs and HTG while adjusting for familial relationships using a kinship matrix.
To adjust for locus-specific ancestry in the stage 1 HTG GWA analysis, we included as covariates in the logistic regression model both the local ancestry estimates as the proportion of number of copies inherited from the Amerindian ancestry (0, 0.5, or 1), and global ancestry estimates from PCA using R software package (V.2.15.0). Locus-specific ancestry was estimated by the LAMPV.2.548 programme as described in the online supplementary methods.
We thank the Mexican individuals who participated in this study. We also thank Cindy Montes and Rosario Rodríguez-Guillén for laboratory technical assistance.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
Contributors Conceived and designed the experiments: DWV, CAAS, RMC, JSS, TTL, PP. Performed the experiments: DWV, CAAS, EN, OAC, LLMH, LGM, MLOS, AJL, NM, MRT, LR, TTL, PP. Analysed the data: DWV, CAAS, EN, KAD, ICB, PMVLR, NM, MRT, LR, RMC, JSS, TTL, PP. Contributed reagents/materials/analysis tools: DWV, CAAS, ICB, OAC, LLMH, LGM, MLOS, PMVLR, AJL, NM, MRT, LR, RMC, JSS, TTL, PP. Read and helped write the paper: DWV, CAAS, EN, KAD, ICB, OAC, LLMH, LGM, MLOS, PMVLR, AJL, NM, MRT, LR, RMC, JSS, TTL, PP. Main writers of the paper: DWV, PP.
Funding This research was supported by the NIH grants HL-095056 and HL-28481. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Competing interests None.
Ethics approval The study design was approved by the ethics committees of the Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán (INCMNSZ), Mexico City and University of California, Los Angeles (UCLA).
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.