Article Text

Original article
A genome wide association study of genetic loci that influence tumour biomarkers cancer antigen 19-9, carcinoembryonic antigen and α fetoprotein and their associations with cancer risk
  1. Meian He1,2,
  2. Chen Wu3,
  3. Jianfeng Xu2,4,5,6,
  4. Huan Guo1,
  5. Handong Yang7,
  6. Xiaomin Zhang1,8,
  7. Jielin Sun9,
  8. Dianke Yu3,
  9. Li Zhou1,
  10. Tao Peng10,11,
  11. Yunfeng He1,
  12. Yong Gao2,6,
  13. Jing Yuan1,
  14. Qifei Deng1,
  15. Xiayun Dai1,
  16. Aihua Tan2,
  17. Yingying Feng1,
  18. Haiying Zhang12,
  19. Xinwen Min7,
  20. Xiaobo Yang12,
  21. Jiang Zhu7,
  22. Kan Zhai3,
  23. Jiang Chang3,
  24. Xue Qin13,
  25. Wen Tan3,
  26. Yanling Hu14,
  27. Mingjian Lang7,
  28. Sha Tao14,
  29. Yuanfeng Li15,
  30. Yi Li7,
  31. Junjie Feng9,
  32. Dongfeng Li7,
  33. Seong-Tae Kim9,
  34. Shijun Zhang2,
  35. Hongxing Zhang15,
  36. S Lilly Zheng9,
  37. Lixuan Gui1,
  38. Youjie Wang1,
  39. Sheng Wei1,
  40. Feng Wang1,
  41. Weimin Fang1,
  42. Yuan Liang1,
  43. Yun Zhai15,
  44. Weihong Chen1,
  45. Xiaoping Miao1,
  46. Gangqiao Zhou15,
  47. Frank B Hu8,
  48. Dongxin Lin3,
  49. Zengnan Mo2,
  50. Tangchun Wu1
  1. 1MOE Key Lab of Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
  2. 2Institute of Urology and Nephrology, First Affiliated Hospital and Centre for Genomic and Personalised Medicine, Guangxi Medical University, Nanning, Guangxi, China
  3. 3State Key Laboratory of Molecular Oncology, Cancer Institute and Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
  4. 4Fudan University Institute of Urology, Huashan Hospital, and Fudan-VARI Centre for Genetic Epidemiology, School of Life Sciences, Fudan University, Shanghai, China
  5. 5Center for Genetic Epidemiology, Van Andel Research Institute, Grand Rapids, Michigan, USA
  6. 6Fudan-VARI Centre for Genetic Epidemiology, School of Life Sciences, Fudan University, Shanghai, China
  7. 7Department of Cardiology, Dongfeng Central Hospital, Dongfeng Motor Corporation and Hubei University of Medicine, Shiyan, Hubei, China
  8. 8Department of Nutrition and Epidemiology, Harvard School of Public Health, Boston, Massachusetts, USA
  9. 9Center for Cancer Genomics, Wake Forest University School of Medicine, Winston-Salem, North Carolina, USA
  10. 10Department of Hepatobiliary Surgery, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
  11. 11Laboratory of Genomic Diversity, National Cancer Institute, NIH, Frederick, Maryland, USA
  12. 12Department of Occupational Health and Environmental Health, School of Public Health, Guangxi Medical University, Nanning, Guangxi, China
  13. 13Department of Clinical Laboratory, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
  14. 14Medical Scientific Research Centre, Guangxi Medical University, Nanning, Guangxi, China
  15. 15State Key Laboratory of Proteomics, Beijing Proteome Research Centre, Beijing Institute of Radiation Medicine, Beijing, China
  1. Correspondence to Dr T Wu, School of Public Health, Huazhong University of Science and Technology, 13 Hangkong Road, Wuhan, Hubei 430030, China; wut{at}mails.tjmu.edu.cn

Abstract

Objective Tumour biomarkers are used as indicators for cancer screening and as predictors for therapeutic responses and prognoses in cancer patients. We aimed to identify genetic loci that influence concentrations of cancer antigen 19-9 (CA19-9), carcinoembryonic antigen (CEA) and α fetoprotein (AFP), and investigated the associations between the significant single nucleotide polymorphisms (SNPs) with risks of oesophageal squamous cell (OSCC), pancreatic and hepatocellular cancers.

Design We carried out a genome wide association study on plasma CA19-9, CEA and AFP concentrations in 3451 healthy Han Chinese and validated the results in 10 326 individuals. Significant SNPs were further investigated in three case control studies (2031 OSCC cases and 2044 controls; 981 pancreatic cancer cases and 1991 controls; and 348 hepatocellular cancer cases and 359 controls).

Results The analyses showed association peaks on three genetic loci for CA19-9 (FUT6-FUT3 at 19p13.3, FUT2-CA11 at 19q13.3 and B3GNT3 at 19p13.1; p=1.16×10−13–3.30×10−290); four for CEA (ABO at 9q34.2, FUT6 at 19p13.3, FUT2 at 19q13.3 and FAM3B at 21q22.3; p=3.33×10−22–5.81×10−209); and two for AFP (AFP at 4q11-q13 and HISPPD2A at 15q15.3; p=3.27×10−18 and 1.28×10−14). These explained 17.14% of the variations in CA19-9, 8.95% in CEA and 0.57% in AFP concentrations. Significant ABO variants were also associated with risk of OSCC and pancreatic cancers, and AFP variants with risk of hepatocellular cancer (p<0.05).

Conclusions This study identified several loci associated with CA19-9, CEA and AFP concentrations. The ABO variants were associated with risk of OSCC and pancreatic cancers and AFP variants with risk of hepatocellular cancer.

  • Cancer Genetics
  • Cancer Susceptibility

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Significance of this study

What is already known on this subject?

  • Cancer antigen 19-9 (CA 19-9), carcinoembryonic antigen (CEA) and α fetoprotein (AFP) are used as indicators for cancer screening and as predictors for therapeutic responses and prognoses in cancer patients.

  • Plasma concentrations of CA19-9, CEA and AFP vary substantially in healthy individuals, as well as in cancer patients, suggesting the influence of both genetic and environmental factors.

What are the new findings?

  • This study identified three genetic loci associated with CA19-9, four loci with CEA and two loci for AFP concentrations.

  • The ABO variants were associated with risk of oesophageal squamous cell cancer and pancreatic cancers and AFP variants with the risk of hepatocellular cancer.

How might it impact on clinical practice in the foreseeable future?

  • Our findings provide new insight into genetic determinants of CA19-9, CEA and AFP in the population.

  • Our findings may enable early screening for cancer risk in the general population.

Introduction

Tumour biomarkers are used as indicators for cancer screening and as predictors of therapeutic responses and prognoses in cancer patients.1–5 Cancer antigen 19-9 (CA19-9), carcinoembryonic antigen (CEA) and α fetoprotein (AFP) are the most commonly used tumour markers, with high sensitivity and specificity for the diagnosis and prognosis of several cancers.6 Plasma concentrations of CA19-9, CEA and AFP vary substantially in healthy individuals, as well as in cancer patients, suggesting the influence of both genetic and environmental factors.79

While genome wide association studies (GWAS) have identified many loci associated with cancers, none has investigated genetic determinants of plasma concentrations of CA19-9, CEA or AFP. These biomarkers increase dramatically in individuals with cancer compared with those who are cancer free. It is possible that the loci influencing tumour biomarker levels may be related to the risk of cancer. Identification of genetic factors for tumour biomarkers may aid in early detection of cancer risk, improve prediction of cancer prognoses and enable personalised treatment.

To identify the genetic loci that influence CA19-9, CEA and AFP tumour biomarkers, we carried out a two stage GWAS. The discovery stage included 3451 subjects; the validation stage 10 326 subjects. The associations between significant single nucleotide polymorphisms (SNPs) and the risks of oesophageal squamous cell cancer (OSCC), pancreatic cancer and hepatocellular cancer (HCC) were further investigated in three case control studies. The study design is summarised in figure 1.

Figure 1

Summary of the study design. AFP, α fetoprotein; CA19-9, cancer antigen 19-9; CEA, carcinoembryonic antigen; SNP, single nucleotide polymorphism.

Methods

Subjects

In the present study, we performed a GWAS of two studies in the Chinese Han population, including the Dongfeng-Tongji cohort study (DFTJ cohort) and the Guangxi Fangchenggang Area Male Health and Examination Survey (FAMHES). All of the subjects were recruited at health check-ups and had no diagnosed cancer or other chronic diseases, including cardiovascular disease, diabetes, stroke or chronic hepatitis.

The DFTJ cohort study was launched between September 2008 and June 2010. A total of 31 000 retirees at Dongfeng Motor Corporation, Hubei, China, were invited. Approximately 87% (n=27 009) of the invited participants agreed and completed the baseline questionnaire, physical check-up examination and provided blood samples.10 We conducted the GWAS in 1461 individuals who did not have diagnosed cancer or other chronic diseases.

FAMHES was initiated in 2009 in Fangchenggang City, Guangxi, China.11 Briefly, participants enrolled in FAMHES had no prior history of cardiovascular disease, cancer or other major chronic diseases. Every participant in the study completed a physical examination in the Medical Centre of Fangchenggang First People's Hospital from September 2009 to December 2009, and provided a blood sample at the same time. Initially, 4364 healthy men were asked to attend this study and 4303 individuals (98.6%) consented, with an age span of 17–88 years. For this study, we included only those aged 20–69 years who reported Han ethnicity (n=2012).

A total of 8830 healthy subjects were included in the first validation set; all were selected from the DFTJ cohort. Another 1496 subjects (996 Han and 500 Zhuang Chinese) from FAMHES were used for the second validation set. The demographic and clinical information of all subjects in the present study is summarised in the online supplementary table S1. All of the participants provided informed consent and the ethics committees of Tongji Medical College and Guangxi Medical University approved this project.

The OSCC case control,12 pancreatic cancer case control13 and HCC case control14 studies have been described in detail elsewhere (including 2031 OSCC cases and 2044 controls; 981 pancreatic cancer cases and 1991 controls; and 348 chronic hepatitis B virus carriers with HCC and 359 chronic hepatitis B virus carriers without HCC).

Phenotype measurement

In the DFTJ cohort, we measured plasma levels of CA19-9, CEA and AFP levels with the Architect Ci8200 automatic analyser (Abbott Laboratories, Abbott Park, Illinois, USA) using Abbott Diagnostics reagents according to the manufacturer's instructions. The intra-assay coefficients of variation were 5.32%, 4.38% and 4.6% for CA19-9, CEA and AFP, respectively. In the FAMHES study, we measured serum CA19-9, CEA and AFP concentrations with electrochemiluminescence immunoassay on the COBAS 6000 system E601 (Elecsys module) immunoassay analyser (Roche Diagnostics, GmbH, Mannheim, Germany), with the same batch of reagents used according to the manufacturer's instructions. The intra-assay coefficients of variation were 3.4%, 4.8% and 2.7% for CA19-9, CEA and AFP, respectively.

SNP selection and genotyping

We did the GWAS scan on the DFTJ cohort using Affymetrix Genome-Wide Human SNP Array 6.0 chips and follow-up genotyping with the iPLEX system (Sequenom) and/or the TaqMan assay (Applied Biosystems). In total, we genotyped 906 703 SNPs among 1461 subjects. After stringent QC filtering, SNPs with minor allele frequency (MAF)<0.01, Hardy–Weinberg equilibrium <0.0001 and SNP call rate <95% were excluded. Individuals with call rates <95% were also not included for further analysis. In total, we retained 1452 subjects with 658 288 autosomal SNPs for the statistical analyses, with an overall call rate of 99.68%.

For the FAMHES GWAS, we used the Illumina Omni-Express platform to conduct the GWAS scan, and the iPLEX system (Sequenom) and/or the TaqMan assay (Applied Biosystems) for follow-up genotyping. QC procedures were first applied to 2012 individuals. A total of 1999 individuals passed the call rate of 95% and were included in the final statistical analysis. We then applied the following QC criteria to filter SNPs: p<0.001 for the Hardy–Weinberg equilibrium test, MAF<0.01 and genotype call rate <95%. In total, we retained 709 211 SNPs.

SNPs selected for validation were based on the following criteria: (1) p≤5.0×10−8 for all GWAS samples; (2) SNP with the lowest p value were selected when multiple SNPs showed a strong linkage disequilibrium (r2≥0.8); and (3) MAF≥0.05. Based on these criteria, a total of 17 SNPs for the three tumour biomarkers were selected for further validation. The primers and probes are available on request. Genotyping in the validation stage was performed according to the manufacturer's iPLEX Application Guide (Sequenom) and TaqMan assay Guide (Applied Biosystems). All genotyping reactions were performed in 384 well plates. Each plate included a duplicate for four subjects selected at random, as well as 6–9 negative controls in which water was substituted for DNA. The average genotyping concordance rate was 99.8%.

Statistical analysis

We evaluated population structure by principal component analysis (PCA) using the software package EIGENSTRAT 3.0,15 and generated a quantile–quantile plot using R 2.11.1 (http://cran.r-project.org/). We used the additive model by linear regression analysis to perform the GWAS, and PLINK 1.06 (http://pngu.mgh.harvard.edu/~purcell/plink/) for statistical analysis.16 The Manhattan plot of −log10 P was generated with Haploview (V.4.1).17 In the DFTJ cohort GWAS, we used MACH V.1.0 software (http://www.sph.umich.edu/csg/abecasis/mach/) to impute untyped SNPs using the linkage disequilibrium information from the HapMap (http://www.hapmap.org/index.html) phase II database (CHB+JPT as a reference set, 2007-08_rel22, released 2007-03-02). The IMPUTE program was used to infer the ungenotyped SNPs in the FAMHES GWAS.18 Imputed SNPs with high genotype information content (proper info>0.5 for IMPUTE and Rsq>0.3 for MACH) were kept for the further association analysis.

We used ProbABEL software to conduct the association studies with imputation data.19 The DFTJ cohort GWAS data from 1452 subjects and the FAMHES GWAS data from 1999 subjects were used to conduct a fixed effects meta-analysis with an inverse variance weighted method using metal software.20 We used the Cochran Q test to assess heterogeneity.21 The chromosome region was plotted using SNAP.22 Prior to the analysis, the three tumour biomarkers were nature log transformed to normalise the distribution.

For the association analyses of the GWAS scan and validation samples, we adjusted for age, sex, body mass index, smoking and drinking. For the GWAS analysis, the top two Eigen vectors were also adjusted as covariates in the linear regression analysis. We conducted a forward selection linear regression analysis with plasma CA19-9, CEA and AFP concentrations to further investigate the independence of top SNPs. We forced age, gender, body mass index, smoking and drinking into the regression model. SNPs with p<0.05 entered the final model. The proportion of the variation in CA19-9, CEA and AFP levels explained by the SNPs was measured by r2, which is the difference of the model sum of squares between models with and without the SNPs of interest divided by the corrected total sum of squares of the full model. We used both the simple and weighted count method to calculate the genetic risk score. These analyses were also performed using SAS V.9.1.3 (SAS Institute, Cary, North Carolina, USA).

Results

The general characteristics and biochemical trait levels of all participate are shown in online supplementary table S1. Distributions of CA19-9, CEA and AFP levels are presented in online supplementary figure S1. Individual variations in plasma levels of the three biomarkers were broad, with 2.43%, 5.96% and 1.27% of participants exceeding the reported normal ranges for CA19-9 (37 U/ml), CEA (5 ng/ml) and AFP (25 ng/ml), respectively.

Principal component analysis showed minimal evidence for stratification in our study population (see online supplementary figure S2). The quantile–quantile plot revealed a good match between the distributions of the observed p values and those expected by chance (see online supplementary figure S3). The small genomic control inflation factor (λ) between 1.001 and 1.032 indicated a low possibility of false positive associations from population stratification. We derived p values in a scatterplot (figure 2) from the additive model in linear regression analyses. Except for the SNP of rs2251844 at the HISPPD2A locus (Q=7.49, p=0.006), we detected no heterogeneity at the other loci (p>0.05).

Figure 2

Manhattan plots for QTL analyses. The analysed trait name is at the top of each graph. The horizontal axis shows the chromosomal positions; the vertical axis the −log10 p values from the test of association by linear regression analysis. The red horizontal line shows the p value of 5.0×10−8.

Results of the genome wide association study

Cancer antigen 19-9

The α (1,3) fucosyltransferase (FUT6)–fucosyltransferase 3 (FUT3) gene cluster at 19p13.3 showed a genome wide association with plasma CA19-9 concentrations (table 1, figure 2, and see online supplementary table S2). The SNP rs3760775 was the strongest signal in this locus (p=1.07×10−172). Two other SNPs (rs17271883 and rs3760776) in the same gene were also significantly associated with plasma CA19-9 levels (p=4.17×10−60 and 2.69×10−36, respectively). In addition, the fucosyltransferase 2 (FUT2)-CA11 locus at 19q13.3 was significantly associated with CA19-9 levels (figures 2, 3). SNP rs1047781 (combined p=7.83×10−176; table 1) in the exon of the FUT2 gene is a non-synonymous variant, and the change from the A to T allele led to a substitution from isoleucine to phenylalanine.

Table 1

Genome wide association analyses for tumour biomarkers cancer antigen 19-9, carcinoembryonic antigen and α fetoprotein

Figure 3

Regional plots of associated loci with cancer antigen 19-9 (CA19-9), carcinoembryonic antigen (CEA) and α fetoprotein (AFP). The horizontal axis shows the chromosomal positions in the NCBI build 36 genome sequences. Red diamonds at each locus indicate the strongest signals in the genome wide scan. Each circle represents a single nucleotide polymorphism (SNP). The colour of the circle indicates the correlation between that SNP and the strongest signal at the locus. The blue line shows the recombination rates given by the 1000 genome pilot. The bottom part of the figure shows the genes at each locus. The yellow bar highlights the locus associated with the CA19-9, CEA or AFP levels at the genome wide association significance.

The SNP rs11880333 in the intron of CA11 was also associated with CA19-9 levels, with a combined p=1.56×10−27. The SNP rs265548 near the B3GNT3 gene at 19q13.1 reached a genome wide significance threshold (p=3.57×10−10), and was validated in replication sets 1 and 2.

We performed a forward selection linear regression analysis of the top SNPs using a combination of the two validation data sets. Four SNPs (rs17271883, rs3760775, rs1047781 and rs265548) were independently associated with CA19-9 levels and jointly explained 17.14% of the total variation in plasma CA19-9 levels (see online supplementary table S5). The association of these SNPs with CA19-9 concentrations was cumulative. The additive effect size was 2.25 U/ml of CA19-9 per one allele increment in the simple count genetic risk score (p for trend=7.08×10−194; see online supplementary figure S4). The additive effect size was 1.54 U/ml of CA19-9 using the weighted genetic risk score (p for trend=2.48×10−117; data not shown).

Carcinoembryonic antigen

We identified four loci associated with plasma CEA concentrations (figures 2, 3). The synonymous variant of rs8176749 in the ABO gene had the most significant p value (p=3.02×10−35) (table 1). The SNP rs8176720 in the exon of the ABO gene at 9q34.2 was also consistently associated with CEA concentrations (p=7.57×10−17). The GWAS discovery stage showed that the loci of FUT6 and FUT2 associated with CA19-9 levels were also associated with CEA concentrations; the p values for rs3760775 in FUT6 and rs1047781 in FUT2 were 9.10×10−21 and 1.25×10−68, respectively. However, the direction of the association for rs3760775 was the reverse of that for CA19-9 and CEA levels; the T allele of rs3760775 was associated with decreased CA19-9 levels but with increased CEA levels (table 1). FAM3B at 21q22.3 (rs441810, p=1.76×10−11) was the fourth locus associated with CEA levels (table 1, figure 2).

These top SNPs jointly explained 8.95% of the total variation in plasma CEA levels (see online supplementary table S6). The additive effect size was 0.19 ng/ml of CEA per one allele increment in the genetic risk score (p for trend=2.43×10−128; see online supplementary figure S4). We obtained similar results by using the weighted genetic risk score (β=0.20; p for trend=3.86×10−167; data not shown).

Alpha fetoprotein

Three loci were associated with AFP at a genome wide significance level. In the GWAS discovery stage, the p values for their strongest SNPs were 7.69×10−9 for rs12506899 in AFP, 5.09×10−8 for rs12579373 in CHST11 and 3.35×10−8 for rs2251844 in the HISPPD2A gene (table 1, figure 2). We validated two loci of AFP and HISPPD2A, but in both validation data sets we failed to do so for the SNP rs12579373 in CHST11 (table 1).

Two SNPs (rs12506899 and rs2251844) explained approximately 0.57% of the total variation in plasma AFP levels (see online supplementary table S7). The additive effect size was 0.20 ng/ml of AFP per one allele increment in the genetic risk score (p for trend=1.67×10−17; see online supplementary figure S4). We obtained similar results by using the weighted genetic risk score (β=0.20; p for trend=1.61×10−17; data not shown). The full list of the SNPs associated with AFP levels<10−7 is shown in online supplementary table S4.

Associations between top SNPs in tumour biomarkers and different types of cancer risk

All of the SNPs in the ABO loci had nominally significant associations with OSCC risk (p<0.05, table 2). Among them, the SNP rs8176749 was in high linkage disequilibrium with rs8176672 (r2=0.90, D′=0.95). Conditional association analysis showed that the associations for rs8176749 and rs8176672 were not significant after mutual adjustment, suggesting that they are unlikely to be independent signals. Other top SNPs in ABO were not significant after adjustment for rs8176749 or rs8176672, indicating that other SNPs were not independent susceptibility markers for OSCC risk. The SNP of rs8176720 in the ABO gene also had a nominally significant association with pancreatic cancer risk (p=0.014, table 2).

Table 2

Associations between the top single nucleotide polymorphisms of cancer antigen 19-9, carcinoembryonic antigen and α fetoprotein, and risks of oesophageal squamous cell, pancreatic and hepatocellular cancers

To examine whether the ABO blood group was associated with the risk of OSCC and pancreatic cancer, we deduced the ABO blood group using two SNPs (rs8176746 and rs505922).23 The results indicated that the ABO blood group was significantly associated with OSCC risk. Compared with the O blood group, those with the A blood group had a significantly decreased risk of OSCC (OR=0.73, 95% CI 0.62 to 0.86; p<0.001); those with the B blood group had a marginally increased OSCC risk (OR=1.17, 95% CI 0.99 to 1.37; p=0.06). However, we did not find a significant association of ABO blood group with pancreatic cancer risk (see online supplementary table S8).

For the top SNPs in the AFP gene, there were significant associations between the SNP rs12506899 and the risk of HCC. Participants who carried the G allele had a decreased risk of HCC (table 2). SNP rs2251844 was not associated with HCC risk.

Discussion

CA19-9 is a carbohydrate tumour associated antigen that is frequently unregulated in pancreatic cancer.24 The sialylated Lea carbohydrate structure is the carbohydrate antigenic epitope of CA19-9.25 Prior studies reported that variants in the Lewis (Le) gene (also known as FUT3) and the secretor (Se) gene (also known as FUT2) were associated with serum CA19-9 levels,26 ,27 and that the Le and Se enzymes participated in the synthesis of CA19-9. We also found an association between the B3GNT3 locus and CA19-9 levels. The B3GNT3 locates at 19p13.1 and encodes a member of the β-1,3-N-acetylglucosmaniyltransferase family that is the backbone structure of dimmeric sialyl Lewis a.28 Since the sialyl Lewis a is the carbohydrate antigenic epitope of CA19-9,25 it might be the potential mechanism underlying associations between variants in the B3GNT3 locus and CA19-9 concentrations.

CEA is a tumour associated antigen first reported by Gold and Freedman in 1965.29 Evaluation of CEA levels has become a useful complement to methods of detecting and staging tumours, following-up for recurrence or metastases, determining response to therapy and assessing prognosis in cancer patients.30 ,31 We found significant associations between ABO genotypes and plasma CEA levels. Our study also indicated that the genetic inferred ABO blood group was significantly associated with plasma CEA levels (see online supplementary table S9). Previous studies show that CEA and the blood A and B antigens share the same glycoprotein carrier molecules.32 ,33 This might explain the association between the ABO locus and CEA concentrations. FUT2 encodes a 1, 2 fucosyltransferase that catalyses the addition of a fucose to precursors to form the H antigen. The SNP rs1047781 in the FUT2 gene is a non-synonymous variant; the change from A to T allele leads to a substitution from isoleucine to phenylalanine. A functional study has found that this substitution inactivates fucosyltransferase.34

The third CEA locus is at 19p13.3, where multiple SNPs in a ∼21 kb region (between 5 781 302 and 5 802 801) had a significant association with CEA levels (p<5.0×10−8) in our GWAS discovery stage (figure 3). The most significant SNP (rs3760775, p=9.10×10−21) was between FUT3 and FUT6; both encode fucosyltransferases that catalyse the formation of Lewis antigens that share the same glycoprotein carrier molecule with CEA.32 Elevated FUT3/FUT6 concentrations might play an important role in the progression of adenocarcinoma.35 The fourth CEA level associated locus is at 21q22.3, which maps to the FAM3B gene. FAM3B is involved in glucose homeostasis and insulin regulation.36 The biological relationship between the FAM3B gene and CEA levels remains unclear.

AFP is a single chain glycoprotein with about 70 000 Da of molecular weight. Although barely detectable in healthy individuals,37 it increases in HCC.38 ,39 We found that the AFP locus had ‘cis’ effects on AFP levels; however, the biological relationship between the HISPPD2A loci and AFP levels remains to be investigated.

Tumour biomarkers are used as indicators for cancer screening and predictors for therapeutic response and prognoses. In the present study, we found that the AFP locus was not only associated with AFP levels but also with HCC risk. The effect direction was consistent. Studies indicate that AFP is not only a biomarker of HCC, but plays a direct role in the progression of HCC by inducing dysfunction and apoptosis of APCs and promoting the escape of HCC cells from immunological control and surveillance.1 ,2

Similarly, the ABO locus was not only associated with plasma CEA concentrations but also with the risk of OSCC and pancreatic cancer. This finding is consistent with those from other studies.40–45 To our knowledge, the present study was the first report of significant associations between SNPs in ABO gene loci and the OSCC risk. CEA and the blood A and B antigens share the same glycoprotein carrier molecules.32 ,33 The blood group was associated with the host immunological and inflammatory state in several reports,46 and might be related to carcinogenesis by involvement in the mediation of intercellular adhesion and membrane signalling.40 ,47

ABO antigens are secreted into the gastrointestinal tract by a functional FUT2 enzyme.48 Approximately 20% of individuals in different populations have no ABO antigen in their gastrointestinal secretions due to homozygous loss of functional mutations in the FUT2 gene,49 and these ‘non-secretors’ have decreased susceptibility to multiple pathogens, including Helicobacter pylori50 which has been linked to gastric adenocarcinoma.51 The current study found a consistent effect direction between the ABO locus–CEA levels and ABO locus–OSCC and pancreatic cancer risk, suggesting that CEA might not only be a tumour biomarker but also may be directly involved in the development of cancers.

In summary, the present GWAS of a large number of Chinese participants identified several novel loci associated with plasma concentrations of tumour biomarkers CA19-9, CEA and AFP. Our findings provide new insights into genetic determinants of these biomarkers in the population. They may enable early screening for cancer risk in the general population. Further studies are warranted to examine the potential structure and/or functional significance of these associations.

Acknowledgments

The authors would like to thank many colleagues for their hard work on the collection of epidemiological data, phenotypic characterisation of the clinical samples, and genotyping and analysis of the genome wide association data. The authors would also like to thank those who agreed to participate in the present study.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:

Footnotes

  • MH, CW, JX, HG, DL, ZM and TW contributed equally to this study.

  • Contributors MH, CW, JX, HG, DXL, ZM, FBH and TW: study concept and design. LZ, TP, YH, YG, JY, QD, XD, DFL, AT, YF, HYZ, XWM, XY, JZ, KZ, JC, XQ, WT, YLH, ML, ST, YFL, YL, JF, DFL, S-TK, SJZ, HXZ, SLZ, LG, YW, SW, FW, WF, YL, YZ, WC and XPM: acquisition of the data. MH, CW, JX, HG, LZ, SW, TP, YFH, YLH, LG and YFL: analysis and interpretation of the data. CW, JX, HY, XZ, JS, DY, JY, WC, GZ, DXL, ZM and TW: contributed reagents/materials/analysis tools. MH, CW, JX, XG and ZM: drafting of the manuscript. DXL, FBH and TW: critical revision of the manuscript for important intellectual content. TW, ZM and MH: obtained funding.

  • Funding This work was supported by grants from the National Basic Research Program grant (2011CB503800) and the Programme of Introducing Talents of Discipline to Universities to TW; the General Program of the National Natural Science Foundation of China (30945204, 30360124, 30260110, 81222038); the Guangxi Provincial Department of Finance and Education (2009GJCJ150); intramural funding from the Fudan-VARI Centre for Genetic Epidemiology and Fudan University Institute of Urology to ZM; and Program for New Century Excellent Talents in University and the General Program of the National Natural Science Foundation of China (81172751) to MH. The funders had no role in the study design, data collection, data analysis or interpretation of the data.

  • Complete interests None.

  • Ethics approval This study was conducted with the approval of the ethics committees of Tongji Medical College and Guangxi Medical University.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Linked Articles

  • Digest
    Emad El-Omar Alexander Gerbes William Grady Thomas Rösch