Background Inflammation of the tonsils is a normal response to infection, but some individuals experience recurrent, severe tonsillitis and massive hypertrophy of the tonsils in which case surgical removal of the tonsils may be considered.
Objective To identify common genetic variants associated with tonsillectomy.
Methods We used tonsillectomy information from Danish health registers and carried out a genome-wide association study comprising 1464 patients and 12 019 controls of Northwestern European ancestry, with replication in an independent sample set of 1575 patients and 1367 controls.
Results The variant rs2412971, intronic in HORMAD2 at chromosome 22q12.2, was robustly associated with tonsillectomy (OR=1.22; p=1.48×10–9) and is highly correlated with SNPs previously found to be associated with IgA nephropathy, Crohn's disease (CD) and early onset inflammatory bowel disease (IBD). The risk allele for tonsillectomy corresponded to increased risk of IgA nephropathy and decreased risk of CD and IBD. We further performed lookup analyses of the top SNP for outcomes related to tonsillectomy in the combined discovery and replication sample and found that rs2412971 was associated with acute tonsillitis (OR=1.19; p=7.82×10–4), chronic disease of the tonsils (OR=1.19; p=2.32×10–6) and appendectomy (OR=1.18; p=1.13×10–3).
Conclusions We identified and replicated a genetic association at 22q12.2 with tonsillectomy. Further functional investigation is required to illuminate whether the molecular mechanisms underlying the genetic association involve general lymphoid hyper-reaction throughout the mucosa-associated lymphoid tissue system.
- Mucosa-associated lymphoid tissue
- Genome-wide association study
- IgA nephropathy
- Inflammatory bowel disease
Statistics from Altmetric.com
- Mucosa-associated lymphoid tissue
- Genome-wide association study
- IgA nephropathy
- Inflammatory bowel disease
The tonsils are masses of lymphoid tissue located at the entrance of the respiratory and digestive tracts. Anatomically, their crypt structure vastly increases the epithelial surface area available for contact with environmental agents, which together with their strategic location make tonsils the first part of the immune system to respond to antigens.1 The tonsils are most prominent in childhood between 3 and 10 years of age at which time their immunological activity is greatest.2 While inflammation of the tonsils is a normal response to infection, some individuals experience recurrent, severe tonsillitis and massive hypertrophy of the tonsils, and surgical removal of the tonsils may be considered. Tonsillectomy remains one of the most common surgical procedures in children and young adults, although incidence rates show a large degree of variability between countries and over time in a country.3–6 In a nationwide cohort study in Denmark, the cumulative risk of tonsillectomy during the first 20 years of life was 6.0%–7.7% in boys and 7.9%–9.2% in girls with incidence peaks around 4 years of age and 16–17 years of age in both sexes.4 The leading indications for tonsillectomy in children are recurrent throat infections and airway obstruction symptoms, including sleep apnoea.7 ,8 Twin studies have indicated a substantial genetic predisposition for recurrent tonsillitis.9 To identify genetic variants underlying an exaggerated tonsillar immune response, we used tonsillectomy information from Danish health registers and carried out genome-wide association study (GWAS) analyses.
We identified patients with tonsillectomy from two sources: the Danish National Patient Register and the Danish Health Service Register. The National Patient Register includes physician-assigned diagnosis and surgical procedure codes for all patients admitted to Danish hospitals since 1977 and for all emergency and outpatient hospital contacts since 1995.10 Eligible cases were defined as patients who had a tonsillectomy according to (a) the Danish Classification of Surgical Procedures and Therapies codes up to December 1995 (code 2614) or (b) the Nordic Classification of Surgical Procedures codes from January 1996 (codes EMB10, EMB20). We further extracted information from the Danish Health Service Register covering the years 1990–2013 and defined an additional set of eligible cases as individuals who had undergone tonsillectomy at otolaryngology offices in Denmark. Eligible controls for the study were individuals without records of tonsillectomy, acute tonsillitis (International Statistical Classification of Diseases, Eighth Revision (ICD-8) codes 034.00, 463.00, 463.01, 463.02, 463.08, 463.09; ICD-10 code J03) or chronic tonsil disease (ICD-8 codes 500.00, 500.02, 500.08, 500.09; ICD-10 codes J35.0, J35.1, J35.3, J35.8, J35.9). We linked this phenotypic information with GWAS data from various Illumina arrays generated in a range of other research projects at Statens Serum Institut. Before analysis, we further excluded individuals who were not of Northwestern European ancestry, or who were identified as first-degree relatives to other individuals in the study using information from the Danish Civil Registration System. Cases and controls for the replication stage had to fulfil the same criteria and were identified among participants in the Danish National Birth Cohort (DNBC).11
Sampling and genotyping
All study samples were drawn from the Danish Newborn Screening Biobank and the DNBC Biobank, both of which are part of the Danish National Biobank. GWAS genotyping of the discovery stage samples was conducted using the following Illumina Bead Arrays: Human660W-Quad (n=3405), Human610-Quad (n=3776), HumanOmniExpressExome-8 (n=3963), HumanOmniExpress-12 (n=1281) or HumanOmni1-Quad (n=1058). Based on the discovery stage results, we selected loci with p<5×10–7 for the replication stage. At each locus, we selected the SNP with the lowest discovery stage p value. For further validation, an additional SNP was selected for replication genotyping at each locus with the requirement that it was also genotyped in the discovery stage data. Genotyping of the selected SNPs in the replication stage sample was performed using competitive allele-specific PCR chemistry (LGC Genomics).
Data cleaning and imputation
Data cleaning, imputation and association analyses were carried out in parallel in two separate discovery cohorts. The first cohort included individuals with GWAS data from the Human660W-Quad or Human610-Quad array, whereas the second cohort included individuals with GWAS data from a HumanOmni array. We chose this strategy due to the limited overlap of SNPs between the Human610/660 arrays and the HumanOmni arrays. In each discovery cohort, we removed samples that (i) had a genotype missing rate >4%, (ii) had an autosomal heterozygosity rate more than three SDs from the mean, (iii) had discrepancy between reported sex and the sex inferred from genotyping or (iv) were more than six SDs away from the mean for one or more of the first five principal components in a principal component analysis. Next, we excluded markers with a missing rate >2%, a minor allele frequency (MAF) <1% or p<1×10–6 in a test of Hardy Weinberg equilibrium. Furthermore, we excluded markers that showed differences in missingness or allele frequencies between arrays within each discovery cohort, or differences in allele frequencies between males and females. For discovery cohort 1 (Human610/660 arrays), we lifted over genomic coordinates from the National Center for Biotechnology Information (NCBI) build 36 to NCBI build 37, whereas the data in discovery cohort 2 were already based on NCBI 37. Finally, we aligned all genotypes to the forward strand and excluded SNPs that did not match known variant positions in the 1000 Genomes project reference data. After these steps, discovery cohort 1 included 944 cases and 6237 controls with genotypes for 502 119 SNPs and discovery cohort 2 included 520 cases and 5782 controls with genotypes for 540 064 SNPs. We used these data for imputation by a two-step procedure. In each cohort, we first estimated haplotypes for our samples using SHAPEIT.12 We then imputed the missing alleles for additional SNPs onto these phased haplotypes based on reference data from the integrated phase I release of the 1000 Genomes Project13 using the IMPUTE214 software package.
We performed genome-wide association scans in each discovery cohort separately by logistic regression under an additive genetic model using SNPTEST.15 For the non-pseudoautosomal region of chromosome X, we treated hemizygous males as equivalent to homozygous females with the assumption that most loci in females are subject to X chromosome inactivation.16 We restricted further analyses to imputed SNP or indels that had a MAF >1% and SNPTEST info >0.8 in both discovery cohorts. After this filtering step, 7 855 440 imputed markers overlapped between the two cohorts. Among these variants, 3982 had the same chromosomal position and alleles, but different rs identifiers (IDs) in the two imputed datasets. We used dbSNP's RsMergeArch table (ftp://ftp.ncbi.nlm.nih.gov/snp/database/organism_data/human_9606/RsMergeArch.bcp.gz) to confirm the identity of the variants and merge different rs IDs into one. In discovery cohort 1, there was a slight inflation of test statistics (λ=1.025) and results were therefore adjusted using genomic control.17 We carried out combined analysis of the two discovery cohorts using fixed-effects inverse variance weighted meta-analysis as implemented in METAL.18 Heterogeneity between studies was estimated using the I2 statistic.19 Combined analysis of the discovery and replication stage data was also conducted by the inverse variance method. Top SNPs with p<0.05 in the replication stage and p<5×10–8 in the combined analysis of discovery and replication stage data were considered to indicate robust evidence of association with tonsillectomy. To assess the possible influence of cryptic relatedness, we took the following steps. First, we estimated pairwise identity by descent (IBD) based on the GWAS data for all possible pairs of individuals. We then excluded one individual from each pair with an estimated IBD proportion >0.1875, which corresponds to the point halfway between second-degree and third-degree relatives and is a commonly used threshold.20 Finally, in the pruned dataset, we repeated the association analyses for the top SNPs.
To investigate the functional characteristics of our findings, we annotated all 145 variants yielding a GWAS p value <1×10–4 at the 22q12.2 locus with ANNOVAR,21 a tool that retrieves variant-specific and region-specific functional annotations from several databases. We retrieved expression quantitative trait locus (eQTL) information for these 145 variants from the Genotype-Tissue Expression (GTEx)22 and Genetic European Variation in Health and Disease (GEUVADIS)23 project databases (accessed 5 April 2016). We further downloaded all reported variants in the National Human Genome Research Institute GWAS catalogue24 (accessed 1 April 2016) associated with a trait or disease at p<5×10–8 and searched for SNPs in linkage disequilibrium (LD) (r2>0.2) with the top SNP at the 22q12.2 locus.
We performed a GWAS of tonsillectomy based on 1464 cases and 12 019 controls from Denmark in the discovery stage, and 1575 cases and 1367 controls from Denmark in the replication stage. Our study design is illustrated in online supplementary figure S1 and table S1 shows sample characteristics of the participants. After imputation using reference data from the integrated phase I release of the 1000 Genomes Project, we analysed 7 855 440 variants for association with tonsillectomy in the discovery stage. The genomic inflation factors were 1.025 and 0.995, in the two discovery cohorts, respectively, indicating little evidence of population stratification. SNPs at two loci showed association at p<5×10–7 (figure 1). At each of these loci, we selected the most significant SNP for genotyping in the replication sample. The locus on chromosome 22q12.2 was replicated, showing nominal significance in the replication stage and reaching genome-wide significance in the combined analysis (table 1). Genotyping of an additional SNP at each locus yielded essentially the same results (see online supplementary table S2) and a sensitivity analysis requiring all pairs of individuals in the study to have an estimated IBD proportion <0.1875 also did not change the results (see online supplementary table S3) indicating little influence of cryptic relatedness.
The most significantly associated SNP at the 22q12.2 locus, rs2412971 (p=1.48×10–9; OR=1.22) is intronic in HORMAD2 and lies in a region of strong LD extending several hundred kilobases covering the genes CABP7, ZMAT5, UQCR10, ASCC2 and MTMR3 (figure 2A). We found no evidence for multiple independent signals at the locus in analyses conditioning on rs2412971 (figure 2B). To explore the possible functional impact of the associations, we searched for functional predictions for all 145 genotyped and imputed variants (SNPs and indels) at the locus with p<1×10–4. None was coding, but almost all variants were reported as gene-level or exon-level cis-eQTLs in the GTEx and GEUVADIS consortia for the many nearby genes in a range of different tissues, including lymphoblastoid cell lines (see online supplementary table S4). Furthermore, several variants at the locus were previously identified in GWAS of a number of other conditions (table 2). The top SNP, rs2412971, and the variant rs12537 were reported to associate with IgA nephropathy in studies based on individuals of Chinese and European ancestry.25 ,26 For both SNPs, the risk alleles for IgA nephropathy corresponded to increased risk of tonsillectomy in our data (table 2). Furthermore, the variants rs2412970 and rs2412973 were reported to associate with inflammatory bowel disease (IBD; ulcerative colitis and Crohn's disease (CD) combined),27 ,28 and rs713875 was associated with CD.29 For these SNPs, the risk alleles for IBD and CD corresponded to decreased risk of tonsillectomy in our data (table 2).
To explore the role of other variants associated with IgA nephropathy, CD or IBD, we looked up the top SNPs for these conditions in the GWAS catalogue and examined their effect on tonsillectomy, but we found no evidence of general effects of known variants for these diseases on tonsillectomy (see online supplementary figure S2).
In the final stages of preparing this article, a GWAS was published including questionnaire-based data on tonsillectomy as an outcome.30 The authors reported rs201112509 at chromosome 22q12.2 to be associated at an allelic OR of 1.12 (95% CI 1.10 to 1.14). In our material, this SNP had a discovery stage p value of 4.0×10–5 and was correlated with our top SNP, rs2412971, with r2=0.56 (see online supplementary table S4). Conditioning on rs201112509 only modestly attenuated the association signal for rs2412971 (discovery p=2.85×10–5; OR=1.23), whereas little residual signal was seen for rs201112509 (discovery p=0.29; OR=1.06) when conditioning on rs2412971.
To assess the impact of rs2412971 on outcomes related to tonsillectomy, we retrieved information for the discovery and replication stage samples about acute tonsillitis, chronic disease of the tonsils and appendectomy. All three outcomes were associated at a corrected p value threshold of 0.05/3=0.0167 with the lowest p for chronic disease of the tonsils (see online supplementary table S5).
Given the differences in tonsillectomy incidence between boys and girls and the two peaks in incidence in childhood and adolescence, respectively, we conducted sex-stratified and age-stratified analyses for the top SNP, rs2412971. We found no statistically significant heterogeneity of effect estimates between boys and girls, and although ORs were somewhat higher in children below 10 years of age compared with those older than 10 years, the difference was not statistically significant (see online supplementary table S6).
Using surgery codes for tonsillectomy from Danish health registers in a sample with existing GWAS data, we identified a genome-wide significant locus for tonsillectomy on chromosome 22q12.2 and replicated the finding in independent sample material. The SNP with the lowest p value, rs2412971, is intronic in HORMAD2, which is located in a region characterised by extended LD covering several other genes to the centromeric side. To the telomeric side of HORMAD2, the two nearest neighbouring genes, LIF and OSM, encode cytokines (leukaemia inhibitory factor and oncostatin M, respectively) that are members of the IL-6 family and play important roles in immunoregulation and inflammation.31 The region is further characterised by a large number of eQTL associations for many genes in many different tissues. Carefully designed studies are needed to identify the causal variant at the locus and to illuminate the biological mechanisms underlying our genetic association results. One potentially revealing approach would be to measure cytokine levels and gene expression in tonsillar tissue from patients having had tonsillectomy for different indications and to compare those data with genetic fine-mapping data, for example, from targeted sequencing of the locus.
A deeper knowledge of the mechanisms underlying the association signal for tonsillectomy might also further our understanding of other diseases. It is notable that the rs2412971-G allele, which confers risk of tonsillectomy in our data, has also been associated with increased risk of IgA nephropathy in Chinese and European ancestry individuals.25 ,26 The main pathological feature of IgA nephropathy is the deposition of IgA-containing immune complexes in the kidney glomerular mesangium.32 A common clinical presentation of the disease in children and young adults is visible haematuria during mucosal infection of the upper respiratory tract.33 Although the pathogenic mechanisms are incompletely understood, increasing evidence supports a role for an abnormal IgA response in the mucosa-associated lymphoid tissue (MALT) system.33 The MALT system includes the tonsils, and a number of studies have focused on tonsillectomy as a possible treatment modality for IgA nephropathy, although the evidence is conflicting.34–37 Rather than focusing on tonsillectomy per se, a better understanding of the shared molecular mechanisms behind IgA nephropathy and an exaggerated tonsillar immune response leading to tonsillectomy might eventually lead to new procedures for prevention or therapy.
It is also noteworthy that rs2412971 is strongly correlated with SNPs that have been reported in GWAS of IBD (ulcerative colitis and CD combined)27 ,28 and of CD alone.29 For these SNPs, the alleles that are protective for tonsillectomy in our data confer increased risk of IBD and CD. A number of epidemiological studies have studied the possible association between tonsillectomy and IBD, but sample sizes have been limited and no clear results have emerged.38–41 In contrast, many case-control and prospective population cohort studies have linked appendectomy to a significantly reduced risk of ulcerative colitis.42 ,43 It has also been established that the decreased risk is due to the underlying appendicitis rather than any protective effect of the surgical removal of the vermiform appendix.44 A recent large Danish cohort study delineated the basis of this association and found that individuals with a first-degree relative diagnosed with appendicitis before age 20 years were at reduced risk of ulcerative colitis.45 The authors concluded that their findings were not compatible with a direct mucosal impact of appendicitis on inflammation of the large bowel, but rather suggested that yet unidentified genetic or environmental factors confer increased risk of appendicitis while being protective against ulcerative colitis.45 Genetic variation at the 22q12.2 may represent one such factor, since the top tonsillectomy SNP, rs2412971, was also associated with appendectomy (see online supplementary table S5). Further studies including, for example, a large number of appendectomy cases, subclassified according to the underlying morbidity, are needed to establish whether variants at the 22q12.2 locus are associated with an exaggerated immune response throughout the MALT system and how this response could influence the risk of ulcerative colitis.
Strengths of our study include the use of confirmed surgery patients from nationwide registers, thereby minimising the risk of recall bias and the validation of the 22q12.2 association in an independent replication cohort. One limitation of our study is that it was restricted to individuals of Danish descent. Further studies are therefore needed to examine effects of variants at the locus in populations of different ancestry. Also, the stratified analyses had modest sample sizes for tonsillectomy cases among boys and children below 10 years of age, and a larger study would be needed to better address the possibility of interactions with sex and/or age.
In conclusion, we have demonstrated that genetic variants in HORMAD2 are robustly associated with tonsillectomy. The top variant was also associated with acute tonsillitis, chronic disease of the tonsils and appendectomy. Further studies are needed to identify the causal variation at the locus and to elucidate the molecular mechanisms underlying our association results. Such knowledge will enhance our understanding of the regulation of the mucosal immune system and may also provide important new insights into the pathogenesis of IgA nephropathy and IBD.
We are grateful to all study participants. We would also like to thank everyone involved in data collection and biological material handling. In particular, we thank Lisbeth Carstensen for assistance during the preparation of the study.
Contributors BF, PB and MM conceived and designed the project. PB and BF planned and performed register data acquisition and phenotypic characterisation. XL, BF and FG carried out the statistical genetics and bioinformatics analyses. BF, PB, XL, HH, FG and MM discussed and interpreted results. EAN contributed samples. DMH performed sampling and genotyping. BF wrote the first draft of the manuscript. All authors contributed to the final manuscript.
Funding This research has been conducted using the Danish National Biobank resource. The Danish National Biobank is supported by the Novo Nordisk Foundation. The study was partly based on samples from the Danish National Birth Cohort, which was established with the support of a major grant from the Danish National Research Foundation with additional support from the Danish Pharmacists' Fund, the Egmont Foundation, the March of Dimes Birth Defects Foundation, the Augustinus Foundation and the Health Fund of the Danish Health Insurance Societies. Grant support for the present study was obtained from the Novo Nordisk Foundation (12955). BF and PB are supported by Oak Foundation fellowships. XL is supported by the Nordic Center of Excellence in Health-Related e-Sciences.
Competing interests None declared.
Patient consent No.
Ethics approval The study was approved by the Scientific Ethics Committee for the Capital City Region (Copenhagen), the Danish Data Protection Agency and the Danish National Birth Cohort steering committee.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement We will be committed to collaborating with other groups interested in using the data from our study. Statens Serum Institut is a governmental institution and a solution for a public repository for Danish GWAS data is currently worked on. When this solution is ready, the GWAS summary statistics from the study will be made directly available to interested researchers.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.