Background The high prevalence of tobacco use in some developing nations, including Bangladesh, poses several public health challenges for these populations. Smoking behaviour is determined by genetic and environmental factors; however, the genetic determinants of smoking behaviour have not been previously examined in a Bangladeshi or South Asian population. We performed a genome-wide association study (GWAS) of tobacco smoking behaviour among a population-based sample of 5354 (2035 ever smokers and 3319 never smokers) men and women in Bangladesh.
Methods Genome-wide association analyses were conducted for smoking initiation (ever vs never smokers), smoking quantity (cigarettes per day), age of smoking initiation, and smoking cessation (former vs current smokers). Sex-stratified associations were performed for smoking initiation.
Results We observed associations for smoking initiation in the SLC39A11 region at 17q21.31 (rs2567519, p=1.33×10−7) among men and in the SLCO3A1 region at 15q26 (rs12912184, p=9.32×10−8) among women.
Conclusions These findings suggest possible underlying mechanisms related to solute carrier transporter genes, which transport neurotransmitters, nutrients, heavy metals and other substrates into cells, for smoking initiation in a South Asian population in a sex-specific pattern. Genetic markers could have potential translational implications for the prevention or treatment of tobacco use and addiction in South Asian populations and warrant further exploration.
Statistics from Altmetric.com
Tobacco smoke is a known human carcinogen, and has been implicated in epidemiologic studies to be associated with increased risk of several cancers.1 While the prevalence of tobacco smoking has begun to decline in many developed nations, it has been continuing to rise in developing nations, including Bangladesh and other South Asian nations.2 There are several public health challenges related to the high prevalence of tobacco use in these populations. Tobacco smokers represent a sizable portion of the population at increased risk of future morbidity and mortality. Tobacco consumption in Bangladesh also generates a sizable burden on the health of the population through household expenditures on tobacco as opposed to other basic resources such as food, housing, health or education.2 ,3 Therefore, tobacco control is a public health priority in Bangladesh, which could be better aided with an understanding of the various determinants of smoking behaviour in this population.
Smoking behaviour is a complex phenotype determined by a combination of environmental and genetic determinants.4 However, an understanding of the genetic basis of smoking behaviour remains limited, especially for the populations of developing countries.5 To date, several genome-wide association studies (GWAS) of smoking behaviour have been conducted,6–14 with several chromosomal regions observed to be associated with various phenotypes of smoking behaviour. However, these association studies have been conducted primarily in populations of European ancestry in developed nations, with only a few exceptions.12–14 No previous study has comprehensively assessed genetic determinants of smoking behaviour in a Bangladeshi or other South Asian population that comprise nearly a quarter of the world's population.
In this analysis, we conducted a GWAS to evaluate genetic determinants of smoking behaviour in a Bangladeshi population. We evaluated genome-wide associations of single nucleotide polymorphisms (SNP) with smoking initiation (ever vs never smokers), smoking quantity (cigarettes per day), age of smoking initiation, and smoking cessation (former vs current smokers).
Subjects and methods
Individuals included in this study were enrolled in two population-based longitudinal studies: the Health Effects of Arsenic Longitudinal Study (HEALS) or the Bangladesh Vitamin E and Selenium Trial (BEST). HEALS, described previously in detail,15 is a cohort study established to investigate health outcomes associated with chronic arsenic exposure from groundwater in a population sample of adults in Araihazar, Bangladesh. Eligibility criteria for participation included being married (to minimise loss to follow-up), aged between 18 years and 75 years, and resident in the study area for at least 5 years. A total of 20 033 (11 746 during 2000–2002 and 8287 during 2006–2008) men and women were enrolled into the HEALS cohort. Trained study physicians, blinded to participants’ exposure to arsenic, conducted in-person interviews and clinical evaluations, and collected urine and blood samples from participants in their homes using structured protocols. BEST is a 2×2 factorial randomised chemoprevention trial evaluating the long-term effects of vitamin E and selenium supplementation on non-melanoma skin cancer risk.16 BEST participants are residents of Araihazar (the same geographic area as HEALS participants), Matlab and surrounding areas. Eligibility criteria included being aged between 25 years and 65 years, permanent residence in the study area, manifest arsenical skin lesions, and no prior cancer history. During 2006–2009, a total of 7000 individuals were enrolled into the study. BEST uses many of the same study protocols as HEALS, including recruitment, interview data (including smoking and covariate data) and biospecimen collection and processing.
There were 2035 ever smokers and 3319 never smokers eligible for these analyses from the combined HEALS and BEST cohorts, with available GWAS data. The study protocols were approved by the relevant institutional review boards in the USA (The University of Chicago and Columbia University) and Bangladesh (Bangladesh Medical Research Council and ICDDR,B). Informed consent was obtained from all participants prior to the baseline interview of the original studies.
For the purposes of these analyses, we considered four smoking phenotypes: smoking initiation (ever vs never smokers), smoking quantity (cigarettes per day), age of smoking initiation, and smoking cessation (former vs current smokers). Self-reported smoking status was ascertained through the baseline interview administered by a trained study physician using a structured questionnaire.15 ,17 See online supplementary information for additional details regarding the smoking phenotype questionnaire ascertainment. Individuals reported smoking status as never, former, or current cigarette smoker at baseline. In these analyses, we considered two binary phenotypes: smoking initiation (ever vs never smoker) and smoking cessation (current vs former smoker). Since the prevalence of smoking among Bangladeshi women is low due to cultural norms,18 analyses for smoking initiation were conducted separately for men and women. We also considered two continuous phenotypes: self-reported age of smoking initiation and average number of cigarettes per day. These variables were evaluated as continuous phenotypes among ever smokers. The distributions of the smoking behaviour phenotypes, stratified by sex, are shown in table 1.
For BEST and HEALS (2006–2008 cohort), DNA extraction was carried out from the whole blood using the QIAamp 96 DNA Blood Kit from Qiagen (Valencia, USA). For HEALS (2000–2002 cohort), DNA was extracted from clot blood using the Flexigene DNA kit from Qiagen (Valencia, USA). Any DNA sample with a concentration <40 ng/μL, and/or 260/280 ratio outside the range of <1.6–≥2.1 (measured by Nanodrop 1000), and/or fragmented DNA <2 Kb (assessed by smearing in Agilent BioAnalyzer) was excluded.
Genotyping was performed using the Illumina HumanCytoSNP-12 BeadChip using 250 ng DNA according to the manufacturer's protocol. There were 5499 DNA samples genotyped. We excluded samples with very poor call rates (<97%; n=12); individuals with gender mismatches (n=79); and duplicate samples (n=54). No individuals had outlying autosomal heterozygosity or inbreeding values. This quality control (QC) resulted in 5354 individuals with high-quality genotype data included in these analyses.
Among 299 140 genotyped SNPs, we implemented the following QC exclusion criteria for SNPs using PLINK19: (1) SNPs without rs numbers; (2) SNP call rate <95%; (3) monomorphic SNPs; (4) Hardy–Weinberg p < 1×10−10. This resulted in 257 768 SNPs. Imputation was performed using MaCH on the basis of the HapMap 3 Gujarati Indians in Houston (GIH) population (Build 36). We also implemented the following QC exclusion criteria for SNPs postimputation: (1) minor allele frequency (MAF) <0.01 and (2) SNP imputation score <0.3. Genotyped and imputed SNPs were included in these analyses, which yielded 1 211 988 million SNPs after QC procedures.
RNA was extracted from mononuclear cells preserved in RLT buffer, stored at −80°C, using RNeasy Micro Kit from Qiagen (Valencia, USA). The concentration and quality of RNA was checked on Nanodrop 1000. cRNA synthesis was done from 250 ng of RNA using Illumina TotalPrep 96 RNA Amplification kit. Gene expression was measured using the Illumina HumanHT-12-v4 BeadChip using 750 ng of cRNA according to the manufacturer's protocol. The chip contains a total of 47 231 probes covering 31 335 genes. We restricted our analyses to specific probes for expression quantitative trait loci (eQTL) analyses, which yielded 31 583 probes. Quantile-normalised expression values were log2 transformed and adjusted for batch variability using ComBat software.20 Gene expression data was available for 1799 individuals (808 women and 991 men) included in these analyses.
Population structure due to relatedness and population stratification was previously examined in this study sample and has been described21; we found very little evidence of population stratification in this sample. Efficient Mixed-Model Association eXpedited (EMMAX)22 software was used to assess associations for smoking phenotypes using genotyped and imputed SNP data. A linear mixed-model regression incorporating the estimated relatedness matrix among individuals for cryptic relatedness (rather than principle components) was used for each SNP, adjusting for sex, age, age×age, and genotyping batch in overall analyses, with SNPs on the X chromosome coded as (0, 1) to indicate the number of minor alleles for men. Sex-specific analyses were adjusted for age, age×age, and genotyping batch. Continuous smoking phenotypes (age of smoking initiation and cigarettes per day) were log transformed to approximate a normal distribution. We considered SNPs to be genome-wide significant if the significance exceeded p<5×10−8.23 Regional association plots were generated using LocusZoom.24 The Versatile Gene-based Association Study (VEGAS) approach was used to conduct gene-based tests, by summing the association signal from all the SNPs within a gene and correcting the sum for linkage disequilibrium using HapMap2 CHB+JPT to generate a test χ2 statistic.25 The eQTL analyses were conducted to evaluate associations between the top variant genotypes with gene expression levels genome-wide. Additive linear models for each gene expression probe, stratified by sex and adjusting for age and smoking status, were run using the Matrix eQTL package implemented in R software.26
The characteristics of the study sample are shown in table 1. The prevalence of smoking was much higher in men (ever smokers, 70.9%) as compared with women (ever smokers, 7.2%). The average number of cigarettes smoked per day among ever smokers was 11.5±8.3 overall, with men skewed toward a larger quantity smoked compared with women. The average age of smoking initiation among ever smokers was 19.3±7.2 years, with men slightly skewed toward a younger age of smoking initiation compared with women.
Genome-wide association analyses for smoking initiation (ever vs never) were conducted separately for men and women since the prevalence of smoking was substantially different by sex. This was primarily to address the concern that there was a lower prevalence of smoking among Bangladeshi women due to cultural norms that potentially could mask a genetic effect. The genome-wide association analysis for smoking initiation in 1837 men ever smokers and 754 men never smokers showed associations of multiple variants with suggestive genome-wide significance (table 2 and figure 1A), with the strongest signal for rs2567519 (p=1.33×10−7). Several SNPs on chromosome 17q21.31 were in close proximity, and figure 2 provides a regional association plot for the top SNPs in the SLC39A11 gene. A gene-based association test for SLC39A11 based on 316 genotyped or imputed SNPs in the gene yielded a p value=3.3×10−4. The overall MAF of rs2567519 did not statistically differ between men and women (MAF=0.41 vs 0.42). The genome-wide association analysis for smoking initiation in 198 women ever smokers and 2565 women never smokers also showed associations of multiple variants with suggestive genome-wide significance (table 2 and figure 1B), with the strongest signals in Xp11.21 (rs4240023, p=7.79×10−8) and 15q26 (rs12912184, p=9.32×10−8) in the region between the SLCO3A1 and ST8SIA2 genes. Several SNPs on chromosome 15q26 were in close proximity, and figure 3A provides a regional association plot for the SNPs. A gene-based association test for SLCO3A1 based on 281 genotyped or imputed SNPs in the gene yielded a p value=0.01; whereas, the gene-based association test for ST8SIA2 based on 147 genotyped or imputed SNPs in the gene yielded a p value=0.67. The overall MAF of rs12912184 did not statistically differ between women and men (MAF=0.34 vs 0.35). The regional association plot for the signal on the X chromosome is shown in figure 3B. See online supplementary information tables S1 and S2 for a summary of the top 1000 variants in relation to smoking initiation by sex. Analyses were also conducted considering betel quid chewing as part of a broader tobacco use phenotype (ever vs never); however, results were not appreciably different from those observed for tobacco smoking and are not presented here.
Analyses for smoking quantity, age of smoking initiation, and smoking cessation were conducted among ever smokers, with men and women combined (see online supplementary information, tables S3-S5). No clear genetic signals were observed for these phenotypes based on examination of the distribution of p values in the quantile-quantile (QQ) plots (see online supplementary information, figure S1A-C). In an effort to replicate findings previously reported in the 15q25.1 region associated with CHRNA5/3 for smoking quantity, we show the regional association plot of our results for this phenotype in online supplementary information figure S2A. The strongest signal in the region was observed for rs41280048 (p=0.0145) upstream of CHRNA5/3 in PSMA4. However, when the study sample was restricted to male ever smokers who smoke at least 10 cigarettes a day (n=688), shown in online supplementary information figure S2B, the strongest signal in the region was observed for rs938682 (p=0.0234) in CHRNA3.
The top association signals for smoking initiation were followed-up in functional analyses of gene expression using an overlapping subset of 1799 individuals. Genome-wide eQTL results are summarised in online supplementary information figure S3 for rs12912184 among women and online supplementary information figure S4 for rs2567519 among men. Neither SNP appeared to strongly regulate mRNA levels.
In this study, we conducted genome-wide association analyses of approximately 1.2 million SNPs on four smoking behaviour phenotypes in an adult Bangladeshi population. We observed evidence of gender-specific associations for transporter genes SLC39A11 (intronic variant rs2567519, p=1.33×10−7) and SLCO3A1 (intergenic variant rs12912184, p=9.32×10−8) with cigarette smoking initiation. These genes have been previously implicated with smoking initiation in a European ancestry population.8
SLC39A11 (solute carrier family 39 (metal ion transporter), member 11), also known as ZIP11 (Zrt-like and Irt-like protein 11), encodes a protein belonging to the ZIP transporter family and promotes zinc transport from the extracellular fluid or from intracellular vesicles into the cytoplasm.27 Variants in SLC39A11 have been previously implicated with visceral and subcutaneous fat in women28 and amyotrophic lateral sclerosis.29 In a previous GWAS of smoking initiation in a European ancestry sample, SLC39A11 SNP rs17780310 was associated with smoking initiation (p=2.4×10−3), and in pathway analyses was selected to be a key transporter gene associated with smoking initiation, although associated with a permutation test=0.279 after accounting for gene size.8 Interestingly, cadmium, a constituent of tobacco smoke, has been shown to interact with zinc in relation to smoking-related outcomes.30–34
SLCO3A1 (solute carrier organic anion transporter family, member 3A1), encodes a protein belonging to the organic anion transporting polypeptide family, and is an uptake transporter that mediates sodium-independent uptake of a broad range of endogenous substances into the cell.35 SLCO3A1 has been associated with nicotine dependence36 and with blood pressure through gene×smoking interaction.37 Furthermore, in a previous GWAS of smoking initiation in a European ancestry sample, SLCO3A1 SNP rs2677911 was associated with smoking initiation (p=1.2×10−3), and in pathway analyses was selected to be a key transporter gene associated with smoking initiation, associated with a permutation test=0.066 after accounting for gene size.8
Based on previous studies, the effect of 15q25.1 on smoking quantity is very small, accounting for 0.20–5% of the phenotypic variance of the cigarettes-per-day phenotype.13 ,38–41 Therefore, we may not have been able to detect a signal in this region due to limited power and/or due to the fact that economic factors may be more important determinants of smoking quantity for our study population. Furthermore, a weak signal in CHRNA3 was observed only when analyses were restricted to men ever smokers who smoked at least 10 cigarettes per day. Our interpretation of this is that smoking quantity in this population is quite heterogeneous, and a genetic effect may only be apparent among heavier smokers.
We acknowledge limitations of our analyses, including limited statistical power. While we identified interesting genes related to smoking initiation, no loci reached genome-wide statistical significance. We had 80% power to detect effect sizes of 1.58 for smoking initiation among men, 2.08 for smoking initiation among women, 1.73 for smoking cessation, and a β of 2.04 for smoking quantity and 1.77 for age of smoking initiation assuming a mean allele frequency of 0.2 and α of 5×10−8. It is possible that additional genetic effects could be revealed with a larger sample size. While self-reported smoking status has not been validated against measures such as cotinine, we have observed dose-response associations of smoking status, smoking quantity and age of initiation in relation to mortality in this study population.42 Therefore, we deem to have an adequate assessment of smoking phenotypes in this population based on observed associations with smoking-related endpoints. However, a strength of this study was that performing gender-specific analyses for smoking initiation enabled us to uncover new loci in genes previously implicated with smoking initiation than may potentially be related to smoking initiation in a Southeast Asian population, and should be explored in future association and functional studies.
In summary, findings from this GWAS among Bangladeshi adults suggest a role for SLC39A11 and SLCO3A1 in smoking initiation in a gender-specific manner. Future studies are needed to replicate and unravel the biological mechanisms that may underlie these associations. Insights into these pathways may provide new targets for smoking cessation therapies to reduce the public health burden associated with tobacco smoking.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
Contributors All authors contributed signiﬁcantly to this work from idea generation, protocol writing, data collection, data analyses, and approved the ﬁnal version of the manuscript.
Funding This work was supported by the National Institutes of Health (grant numbers P42 ES010349 and R01 CA107431).
Competing interests None.
Patient consent Obtained.
Ethics approval University of Chicago and Bangladesh Medical Research Council.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.