Background Drug absorption, distribution, metabolism and excretion (ADME) contribute to the high heterogeneity of drug responses in humans. However, the same standard for drug dosage has been applied to all populations in China although genetic differences in ADME genes are expected to exist in different ethnic groups. In particular, the ethnic minorities in northwestern China with substantial ancestry contribution from Western Eurasian people might violate such a single unified standard.
Methods In this study, we used Affymetrix SNP Array 6.0 to investigate the genetic diversity of 282 ADME genes in five northwestern Chinese minority populations, namely, Tajik, Uyghur, Kazakh, Kirgiz and Hui, and attempted to identify the highly differential SNPs and haplotypes and further explore their clinical implications.
Results We found that genetic diversity of many ADME genes in the five minority groups was substantially different from those in the Han Chinese population. For instance, we identified 10 functional SNPs with substantial allele frequency differences, 14 functional SNPs with highly different heterozygous states and eight genes with significant haplotype differences between these admixed minority populations and the Han Chinese population. We further confirmed that these differences mainly resulted from the European gene flow, that is, this gene flow increased the genetic diversity in the admixed populations.
Conclusions These results suggest that the ADME genes vary substantially among different Chinese ethnic groups. We suggest it could cause potential clinical risk if the same dosage of substances (eg, antitumour drugs) is used without considering population stratification.
- Molecular genetics
Statistics from Altmetric.com
Efficacy and safety are among the most important considerations in the discovery and application of drugs. Both are associated with drug metabolism and transportation. To serve their therapeutic effects, drugs in the body must be absorbed, distributed, transported and bound to targets. The toxicity and safety of drugs are determined by their timely metabolism into water-soluble substances and excretion.1 ,2 The diversity of drug absorption, distribution, metabolism and excretion (ADME) genes that encode drug-metabolising enzymes is one of the important factors affecting drug efficacy and safety.
Genetic studies have revealed that different populations have different genetic structures because of their complex demographic histories.3 Genetic differences in ADME genes are also expected to exist in different ethnic groups.4 For instance, CYP3A4 and CYP3A5, which metabolise an estimated 50% of the currently used drugs, exhibit significant genetic differences between Africans and non-Africans,5 and CYP1A2, which mainly metabolises antidepressants, exhibits substantial genetic difference between Europeans and other populations.6 It was found the NAT2 gene, which is involved in detoxification of a large number of chemicals, and its variants associated with slow metabolism are more prevalent in East Asians than in other populations.7 We previously investigated the genetic diversity of ADME genes on a worldwide scale;8 ancestral information was also found to influence the genetic variations of ADME genes in global populations,9 and population specific alleles of ADME genes were discovered to be associated with gene expression level.10 These results suggest that the population difference of ADME genes could affect the clinic decision-making in different populations. However, most developing countries follow the US FDA/European Medicines Agency guidelines, while using the same therapeutic drug dosages among different populations without considering that the variation of ADME genes could be a potential clinical risk.
The clinical risk of drug response heterogeneity is also present in countries with multiple ethnic groups because of population stratification.11 For instance, the Han Chinese comprises 98% of the total population in China. Population sizes of the other 55 minority groups vary from thousands to millions; some of these ethnic groups differ substantially from the Han Chinese in terms of morphological and genetic characteristics. For example, the Uyghur (UIG) population, the largest minority group in northwestern China, differs significantly from the Han Chinese in terms of facial features and has approximately 55% European genetic components.12 ,13 Similar to UIG, the ethnic groups Tajik (TJK), Kirgiz (KGZ), Kazakh (KZK) and Hui (HUI) are also highly distinguishable from both current Europeans and East Asians due to their multiple ancient genetic sources.14 ,15 The differences in genetic structures between admixed populations and their ancestral source populations (ie, Europeans and East Asians) might also affect the heterogeneity of drug responses.16
In reality, most people from minority groups are reluctant to participate in clinical tests involving pharmacodynamic and pharmacokinetic studies. Hence, the national dosage standard of China has been mostly determined based on clinical studies in Han Chinese population. This procedure ignores the genetic difference between the minority groups and the Han Chinese. Therefore, it is important and urgent to explore the types of medicines that could elicit different drug responses between the minority groups and the Han Chinese. The genetic differentiation of ADME genes between these populations and the Han Chinese should be considered as a fundamental factor. However, systematical pharmacogenetic studies on these populations are very limited. The present study is the first effort to investigate the diversity patterns of ADME genes between the five northwestern Chinese populations (TJK, UIG, KZK, KGZ and HUI) and Han Chinese in Beijing (CHB). The clinical implications of these genetic variations in drug safety and efficacy are also discussed.
Methods and materials
Genetic variation data
DNA samples of 47 TJK, 42 UIG, 45 KZK and 46 KGZ were collected from the Xinjiang Uyghur Autonomous Region in China, whereas DNA samples of 30 HUI were collected from the Ningxia Hui Autonomous Region in China. All procedures were followed in accordance with the ethical standards of the ethics committee of Fudan University and the Helsinki Declaration of 1975, as revised in 2000.
The genotyping data of five HapMap III17 population samples (181 Yoruba in Ibadan (YRI), 183 Utah residents with Northern and Western European ancestry (CEU), 91 Toscani in Italia (TSI), 89 CHB and 91 Japanese in Tokyo (JPT)) were included in this study. The full sequencing data of 181 CEU and 89 CHB samples obtained from The 1000 Genomes Project18 were also included.
Genotyping and data filtering
Genotyping using the Affymetrix Genome-Wide Human SNP Array 6.0 was performed according to the ‘48 Sample Protocol’ (Affymetrix, Genome-Wide Human SNP Nsp/Sty 6.0 User Guide, Rev. 3, 2008, P/N 702504). The raw intensity data were analysed with Birdsuite V.22.214.171.124 After excluding the individuals with genotyping call rate below 90%, SNPs with missing data >10% individuals and SNPs that failed the Hardy–Weinberg equilibrium test (p<0.0001) within each population, we acquired 893 949 autosomal SNPs for further studies.
Phasing and merging datasets
The genotyping data were further phased with fastPHASE20 and merged with PLINK21 into the haplotype data of the five populations (CEU, TSI, CHB, JPT and YRI) from HapMap III dataset, and non-overlapping SNPs were removed. The merged dataset consisted of haplotypes for 833 658 SNPs in 845 individuals from 10 populations.
The ADME gene lists were obtained from the PharmaADME database (http://www.pharmaadme.org/). After excluding the genes located on sex chromosomes and with less than three polymorphic sites in our data, 32 core ADME genes and 250 extended ADME genes were acquired. According to PharmaADME database, ADME core genes are defined as the most important genes in drug metabolism, while ADME extended genes also play roles in drug metabolism but are relatively not essential as those core genes. Gene coordinate information was obtained from the RefSeq database,22 and 10 kb upstream and downstream of each gene were included.
Functional annotations of SNPs and haplotypes
The functional effects of each SNP from each ADME gene were obtained based on the variance effect prediction tools from the Ensemble database.23 Here, the functional SNPs were defined as non-synonymous variations or splice sites annotated in Ensemble or variations affecting gene expression annotated in RegulomeDB dataset,24 and the associated SNPs were defined as those that have been reported to be directly associated with drug response/dose as collected and annotated in the PharmGKB database.25 Therefore, we classified the SNPs into four categories: (1) FASS: functional SNPs with association reported; (2) FNAS: functional SNPs but without association reported yet; (3) NFASS: having been reported association but not found precise function yet; and (4) NFNAS: no functional and without reported association currently.
Structure and local ancestry inference of admixed populations
STRUCTURE (V.2.3)26 was used to infer the population structure. Here, each STRUCTURE analysis was run with 100 000 burn-ins and 100 000 iterations. The local ancestry inference of each admixed population was estimated with HAPMIX.27 The prior ancestry proportions of five admixed populations were obtained from the STRUCTURE results, and other parameters were set as default. We also obtained the local ancestral components of each gene based on the HAPMIX's results.
Imputing allele frequencies of admixed populations
The expected allele frequencies in the five admixed populations could be imputed from their putative ancestral source populations, that is, CEU and CHB, as follows: where fCEU and fCHB denote the derived allele frequency of each locus in CEU and CHB, respectively, ρEur represents the contribution of the European ancestry in each locus, and CNW represents one of the five admixed populations.
Statistical and population genetic analyses
Principal component analysis (PCA) was performed at the individual level using EIGENSOFT V.3.0.28 Analysis of molecular variance (AMOVA) and FST calculations were performed with Arlequin 3.5.29 Particularly, the FST of SNPs were calculated following Weir and Cockerham,30 while the FST of genes were the average F-statistic over loci. The significance of a gene's FST was measured with permutation test by randomly shuffling individuals among populations. The empirical p values were also calculated for each gene's FST and each SNP's FST by comparing the FST values of 500 randomly selected genes and all autosomal SNPs, respectively.
The populations studied in this work had different sample sizes. Hence, we randomly sampled 30 individuals from each population and calculated the expected heterozygosity (He) on all loci of 282 ADME genes for each population. For each ADME core gene, the significance of haplotype diversity (Hd) among populations was assessed with permutation test by randomly resampling individuals among populations.31
Selection signal detection
The integrated haplotype score (iHS) and composite likelihood ratio (CLR) tests were used to detect signals of recent positive selection. The unstandardised iHS scores were calculated by iHS.32 The CLR scores were calculated using SweepFinder.33 We calculated the iHS score and CLR score for each SNP at whole autosomal regions, and identified top SNPs with top 1% highest scores. The selection candidates were identified with at least three top SNPs found in genes including 10 kb upstream and downstream.
Population differences in ADME genes
PCA was used to depict the differences in ADME genes among two European groups (CEU and TSI), two East Asian groups (CHB and JPT) and five minority groups in northwestern China (HUI, KGZ, KZK, UIG and TJK), as shown in figure 1. The principal components (PC) plot based on data of ADME gene regions showed very similar patterns to that of whole genome data (see online supplementary figure S1). This result indicates the presence of clear population stratification among the three groups (Europeans, East Asians and five northwestern Chinese populations). PC1 (accounting for 47.4% of the total genetic variances) clearly showed that the five minority populations were different from Europeans and East Asians. Especially, these populations were intermediate to Europeans and East Asians. PC2 (accounting for 5% of the total genetic variances) also separated the five minority populations from both Europeans and East Asians. The relative positions of the five minority populations in the PC plot reflect their genetic relationship; as shown in figure 1, HUI is very close to CHB and JPT; TJK is closer to CEU and TSI than to East Asians; and KZK, KGZ and UIG are very close to each other and intermediate to the other populations.
FST value was calculated for each ADME gene to further quantify the differences in ADME genes among different populations. The results of 32 ADME core genes are listed in table 1. Except for CYP2B6, GSTM1 and TPMT, the other 29 ADME core genes had extremely different FST values (permutated p value <0.01) among different populations, suggesting that most of the ADME core genes differ substantially among the studied populations. Compared with 500 randomly selected genes, though the p values of most of ADME genes and random genes are very small (see online supplementary figure S2A), the distribution of FST values of ADME core genes was shifted to higher values compared with random genes, as shown in online supplementary figure S2B. Notably, the FST values of CYP1A1 and CYP1A2 are higher than top 1% threshold (0.112), and the FST of CYP3A4, CYP3A5 and NAT2 are higher than top 5% threshold (0.101).
AMOVA was used to further investigate the genetic variations of ADME core genes among populations, as shown in table 1 and online supplementary table S1. Compared with 500 randomly selected genes from the whole genome, CYP1A1, CYP1A2, CPY3A4, CYP3A5 and NAT2 exhibited higher variations among the three groups (Europeans, East Asians and northwestern Chinese) than top 1% threshold (7.34%), whereas SLC15A2 exhibited higher variation between populations within group than top 1% threshold (4.97%). The AMOVA results confirm that some ADME genes vary significantly among different populations compared with randomly selected genes.
Identification of highly differentiated functional/associated SNPs in ADME genes
Pharmacogenetic studies focus on the genetic diversity patterns of functional SNPs, that is, the variants that serve important functions in drug response heterogeneity. In the present study, we used public annotation databases, including PharmGKB, RegulomeDB and Ensemble, to classify all SNPs into four categories as described in Methods and materials section.
From our dataset, we found 70 functional/associated (FASS, FNAS and NFASS) SNPs in the ADME core genes and 543 functional/associated SNPs in the ADME extended genes. Among those SNPs, four highly differentiated functional/associated SNPs (one FAAS and three NFASS SNPs) were identified in the ADME core genes, and another six (one FASS, two FNAS and three NFASS SNPs) were identified in the ADME extended genes, as shown in table 2. These SNPs with high FST values might lead to heterogeneity in drug response. For instance, compared with ancestral allele (A), the derived allele (C) of the rs2235047 SNP (NFASS) from the ABCB1 gene has higher correlation with increased likelihood of cardiotoxicity upon exposure to anthracyclines.34 The derived allele (also the risk allele C) of this SNP is in the highest frequency in CHB (0.500) and the lowest frequency in CEU (0.008). Accordingly, in the five northwestern Chinese groups with different ancestry contributions from eastern (CHB) and western (CEU) populations, the allele frequency of rs2235047 varies from HUI (0.383) to TJK (0.075). These results suggested that the metabolism of ABCB1 substrates, such as anthracyclines, leads to different distribution of phenotypes among those populations.
Of the 10 functional/associated SNPs with substantial frequency differences (table 2), the FASS SNPs (rs1208 and rs1056838) could be directly considered as markers for the heterogeneity of drug response, and the six NFASS SNPs could be tagged with undetected functional SNPs, although they could also be regarded as potential biomarker because of the reported associations. Furthermore, the two FNAS SNPs might be informative for future clinic verifications since they could affect the protein structures of related enzymes.
Functional/associated SNPs with different heterozygous states among populations
Apart from the SNPs with extremely different frequencies among different populations, variants with highly different heterozygous states are also important in clinical studies. As shown in table 3, 14 functional/associated SNPs (five FASS, four FNAS and five NFASS SNPs) with low heterozygous states were present in the Han Chinese population (He<5%); however, high heterozygous states were present in the other populations (He>15%). For instance, with respect to the intronic mutation rs7089580 (NFASS) in CYP2C9 (11809A>T), patients with both the TA and TT genotypes require higher doses of warfarin than patients with the AA genotype.35 At this variation, the expected heterozygosity in CHB is 4.4%, whereas in both UIG and TJK it is ≥19.0%. The results of FST screening indicate that the allele frequency differentiation of these SNPs among different Chinese ethnic groups is not significant compared with the entire genome (FST=0.061). However, the great difference of heterozygosity of rs7089580 might indicate more complexity drug responses among the Chinese minority groups compared with Han Chinese, which could lead to unpredicted clinical risk for the drugs listed in table 3.
The 14 functional/associated SNPs with extremely different heterozygous states listed in table 3 could affect response heterogeneity of a broad range of drugs. Examples of drugs that could be affected include the antitumour drugs bisantrene and capecitabine, the antituberculosis drug isoniazid, the anticoagulant drug warfarin, and the hypercholesterolaemia treatment drugs fluvastatin and fenofibrate. Therefore, different dosages should be considered for these drugs and their related substrates due to population stratification.
Identification of highly differentiated functional/associated haplotypes in ADME genes
In pharmacogenetic studies, the clinical phenotypes of drug metabolism are more likely to be determined by haplotypes which are composed of functional/associated variants rather than by single independent SNPs. Table 4 shows the diversity of clinical haplotypes of 18 genes with at least three functional/associated SNPs from our genotyping data among the five minority populations and their ancestral source populations (CHB and CEU). Of these 18 genes, eight showed significantly different haplotype diversity patterns among different populations with permutated p value less than 0.01. For instance, figure 2 shows that NAT2 haplotypes encompass rs1799929 (NFASS), rs1799930 (FASS) and rs1208 (FASS), while CHB was dominated by the haplotype CGA (78%); the five minority groups demonstrated broader haplotype distributions. We could not directly infer the detailed metabolic phenotype of NAT2 from only three SNPs. Nevertheless, rs1799929 (481C>T), rs1799930 (590G>A) and rs1208 (803G>A) are reported signature alleles for the NAT2 *6, NAT2 *11 and NAT2 *12 haplogroups, which are all associated with either the slow or the rapid acetylator phenotype. Therefore, the different haplotype distributions between Han Chinese and the minority groups suggest that they also differ substantially in metabolism phenotypes for NAT2 substrates.
Inference of ancestral origins of alleles in ADME genes
The above analysis clearly indicates that substantial differences in the ADME genes exist between Han Chinese and the five minority groups. We further determined the factors that may have caused these differences. In our previous study,12 we reported that UIG is an admixed population with both European and East Asian ancestries. Figure 3A shows the genetic structure analysis of the five populations compared with Africans (YRI), Europeans (CEU and TSI), and East Asians (CHB and JPT). Results showed that the genetic components of the five minority groups were admixed with different proportions of Europeans and East Asians. In detail, the genetic contribution of Europeans was 76.9% to TJK (SD=0.049), 53.5% to UIG (SD=0.053), 38.4% to KZK (SD=0.050), 35.6% to KGZ (SD=0.059) and 9.1% to HUI (SD=0.035).
We used Hapmix to investigate the local ancestral genetic origins of the five admixed populations (see online supplementary figure S3). Figure 3B shows the sum of European ancestry contributions to the 32 ADME core genes in five admixed populations. Overall, the CYP1A2 shows the lowest European contributions (192.1% in total), while the CYP2D6 shows the highest European components (249.6% in total). The red line in figure 3B represents the average value (204.7%) based on whole genome data; thus, totally there are 26 of 32 ADME genes showing higher European contribution than average.
Imputation of functional/associated SNPs from sequence data using admixture analysis
Minority ethnic groups have rarely been sampled for deep sequencing to explore the polymorphic state of functional variants. We explored the possibility to estimate the frequencies of functional SNPs in the admixed populations from their putative ancestral source populations, which have been sequenced and with data available in public database such as The 1000 Genomes Project.
Figure 4 shows acceptable correlation between the observed and expected frequencies of the functional SNPs (r=0.93, p<10−15) in the five admixed populations. Therefore, we could roughly infer the frequencies of the functional SNPs for the admixed populations to identify the high-risk alleles that might cause drug response heterogeneity among populations.
Given the significant correlation between the actual and predicted frequencies in the admixed populations, we found 17 and 46 additional functional/associated SNPs from the ADME core and extended genes (table 5 and see online supplementary table S2), respectively, indicating high differentiations between CHB and CEU from the full sequencing dataset of The 1000 Genomes Project. These SNPs were still enriched in the functional variants associated with heterogeneous responses to caffeine and antitumour, immunosuppressant and antidepressant drugs. Although these functional/associated SNPs were not found in our dataset, they were supposed to have great difference of frequencies between CHB and the five minority populations because of European gene flow. Thus, the influence of these variants on drug safety and efficacy among different populations must also be considered.
Natural selection on ADME genes in different populations
In the admixed populations, most of the allele frequencies inferred from their ancestral populations were generally fit to their actual frequencies. However, some SNPs still obviously deviated from the expected allele frequency (figure 4). This deviation may be attributed to natural selection. In the present study, we used two methods to detect natural selection signals among the five minority groups and their putative ancestral source populations.
The natural selection signals were enriched in ADME genes (figures 5). For instance, five phase I ADME core genes (CYP2B6, CYP2E1, CYP3A4, CYP3A5 and DPYD) and four core transporter genes (ABCB1, SLC15A1, SLC22A1 and SLCO1B3) showed significant selection signals in at least two populations. Genes with selective signals were detected in the two ancestral source populations. Selection signals were consistently present in the admixed populations, such as in most of the genes mentioned above. In particular, CYP3A4 and CYP3A5 exhibited prevalently selective signals in CEU and most of the admixed populations but not CHB. By contrast, SLC22A1 and SLCO1B3 only had selective signals in the admixed populations but not in either CHB or CEU. These results indicate that the adaptation of ADME genes could occur before or after the admixture of ancestral European and Asian populations.
In this work, we studied five minority groups in northwestern China that are admixed populations with both eastern and western Eurasian ancestries. Generally, admixture increases genetic diversity of genomes and genes, including intermediate allele frequencies, heterozygous allele states and haplotype combinations. Consequently, the distribution of different metabolic phenotypes in the admixed populations is expected to be broader than that in non-admixed populations. Thus, more clinical patient samples are needed to determine appropriate drug dosages in those admixed populations. However, people of minority groups usually refuse to participate in clinical studies. Our study should advance the understanding of the genetic basis of the drug response heterogeneity among populations in China, and have significant implications for evaluating potential heterogeneity of drug responses within multiple ethnic groups of Chinese. It is noteworthy that there could be further subpopulation structure within each ethnic group studied here. For example, considerable genetic differences between south and north Han Chinese were reported previously.36 ,37 Those substructures could have unpredictable but substantial influences on evaluation of drug response and great implications for future pharmacogenomic studies. Therefore, we suggest further extensive investigations should be conducted.
It is noteworthy that the genetic variants are only one of the factors affecting drug responses and most of explicit consequences of genetic variants are not yet fully understood. Thus, the phenotypic consequences of population differentiations of ADME genes should be carefully validated in future studies. On the other hand, although the role of ethnicity in pharmacogenomics studies is still debateable, there are essential ethnic consequences of different drug dose requirements among different populations.38 Therefore, individual genotyping/sequencing is necessary in future pharmacogenomic studies because higher heterogeneity of drug responses is also expected in admixed populations and any oversimplified ethnic medicine standards might be inappropriate.
We systemically investigated the influence of admixture to the diversity of ADME genes. The results of this study could be used as reference in future pharmacogenetic studies on admixed populations. TJK, UIG, KZK, KGZ and HUI exhibit different ancestral European components ranging from high to low. The intermediate admixed proportions exhibit the highest genetic diversity at the entire genome and then at the ADME genes. That is, UIG, KZK and KGZ (average European components are 53.5%, 38.4% and 35.6%) have substantially higher genetic diversity than the other populations, as shown in online supplementary figure S4. These results suggest that ancestral sources of admixed populations are equivalent. All of them exhibit high diversity at the genetic architecture. Conversely, our genotype dataset shows that Europeans exhibit slightly higher genetic diversity than East Asians. Thus, the European components are deemed to have higher contribution to the increase in genetic diversity than the East Asian components. For instance, TJK (76.9% European component) exhibit substantially higher genetic diversity than HUI (9.1% European component) (see online supplementary figure S4). However, despite being unlikely to affect our main results and conclusions, the contribution of the ancestral genetic component could be misleading due to possible ascertainment bias of genetic variants using genotyping approach. Therefore, full sequencing is worthy for future studies.
Aside from demographic histories, such as admixture events, natural selection is another important factor that affects the diversity of genes. Enriched signals of positive selection were observed in the ADME core genes, which are consistent with our previous results.8 Interestingly, we found that some ADME genes (CYP3A4 and CYP3A5) showed positively selective signals before population admixture, while some other genes such as SLC22A1 and SLCO1B3 exhibited the selection signals after the admixture events.
Our results show that the northwestern Chinese populations exhibited substantial differences at some ADME genes compared with the Han Chinese. We suggest population differences should be carefully considered before related drugs are introduced to the Chinese market.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
Acknowledgements We thank all volunteers for participating in this study. SX also gratefully acknowledges the support of the National Program for Top-notch Young Innovative Talents of The ‘Ten-Thousand-Talents’ Project and the support of K.C. Wong Education Foundation, Hong Kong.
Contributors SX conceived the study. SX and JL designed the study. SL, LJ, XP, WY, MS and DM collected samples. JL, HL, XY and DL performed the analyses. SX and JL interpreted the data and wrote the paper. All authors have read and approved the final version of the manuscript. SX is Max-Planck Independent Research Group Leader. SX and LJ are members of CAS Youth Innovation Promotion Association.
Funding This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB13040100), by the National Science Foundation of China (NSFC) grants (91331204, 31370505, 31171218, 81160249, 30901238 and 30971577), by the Knowledge Innovation Program of Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences (2013KIP108), and by Natural Science Foundation of Xinjiang Uyghur Autonomous Region (2013211A016). Funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests None.
Patient consent Obtained.
Ethics approval The ethics committee of Fudan University.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.