Statistics from Altmetric.com
Hodgkin’s disease was recently designated Hodgkin lymphoma (HL) in the World Health Organization Classification.1 The National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) population based registries estimate that 7900 new cases are diagnosed annually in the USA.2 Clues to its aetiology have been suggested by the bimodal age distribution; higher risks in males, in people with higher socioeconomic status, and in smaller families; and occurrence of Epstein-Barr virus in HL tumour cells.3 The importance of genetic factors is indicated by reports of multiply affected families from case series,4–6 a twin study,7 a case–control study,8 and population registry studies carried out in Utah,9 Denmark,10 Israel,11 and Sweden.12–14 We recently analysed data from registries in Sweden and Denmark and found significant familial aggregation of HL and other lymphoproliferative tumours.15 The relative risk for HL among first degree relatives of cases compared with controls was 3.1. Relative risks were higher in males compared with females, and in siblings of cases compared with parents and offspring. Relatives of earlier onset cases were at higher risk for HL and for all lymphoproliferative tumours and were also at higher risk for developing early onset tumours themselves. These findings are consistent with those seen from earlier case series studies but have the advantage of being from large, population based samples.
It is not known whether or how extrinsic risk factors interact with genetic susceptibility. Identifying inherited susceptibility genes is an important step towards defining the pathway(s) leading to development of HL and understanding its complex aetiology. There have been many studies of somatic mutations in HL tumour cells, but although there are associations with HLA types, specific germline genes causing susceptibility have not yet been identified. Early studies of HLA Class I alleles in familial HL showed increased haplotype sharing among affected sibling pairs.16,17 We have previously conducted studies of HLA Class II loci in 16 high risk HL families and found that alleles reported to be associated in case–control studies (such as DRB1*1501 and DQB1*0602) were also associated with familial HL using a family based analytical approach.18 There have been no comprehensive searches of the genome for HL genes, largely due to the difficulty in assembling informative samples. Even though this tumour is strongly familial, the proportion of cases with a family history is small, and affected families typically have very few cases.
We studied 44 informative high risk HL families and applied a whole genome search using densely spaced microsatellite markers in order to localise susceptibility genes.
Ascertainment of HL pedigrees
The Genetic Epidemiology Branch (Division of Cancer Epidemiology and Genetics) has been recruiting families with two or more living cases of HL since 1970. This study was approved by an institutional review board, and informed consent was obtained on all subjects in this report. At the NIH clinical centre or on field trips, we evaluated all available affected individuals and first degree relatives of those affected, and obtained biospecimens. We also obtained original pathology material and reports for all HL and NHL cases where possible, and these were reviewed by the National Cancer Institute Laboratory of Pathology. Of the families investigated, 44 were judged to be informative for linkage studies, based on the number of available DNA samples (total 254) from affected and unaffected individuals. Sixteen of these families had been included in an earlier study of linkage and association with the HLA region.18 Table 1 shows the distribution of the number of affected individuals per family and the relationships among the HL cases. A total of 106 individuals in the families have been diagnosed with HL. Of these, DNA samples were available for 89, and genotypes for an additional six cases could be inferred from other family members. Of eight cases of NHL, four had either DNA samples or inferable genotypes. The level of diagnostic certainty for the HL cases was high, with 85% confirmed by either an outside pathology report, a slide reviewed at the NCI Laboratory of Pathology, or both. All cases were considered “affected” for linkage analysis. The mean age at diagnosis of HL was 26.8 years, which is much lower than that in the population, where the median age at diagnosis is 37 years.2 Over 90% of our cases would be considered as having onset in childhood or young adulthood (earlier than 45 years of age) There was no difference in age at diagnosis among families with only two HL cases compared with those with more than two cases. Among those cases who could be classified into subtypes (75% of the total), there was a predominance (80%) of the nodular sclerosis (NS) subtype, with nearly all of the remaining having the mixed cellularity subtype, consistent with the young age distribution. There was a slight female predominance. As can be seen in table 1, two thirds of the families had cases among siblings and/or cousins; the remainder showed parent–offspring configurations with or without siblings or other relatives.
Hodgkin lymphoma (HL) has a strong familial component but no genes have yet been identified.
We performed a genomewide linkage screen in 44 high risk HL families with a total of 254 individuals with DNA samples. Among these families, there were 95 HL cases and four cases of non-Hodgkin lymphoma (NHL) who were informative for linkage. The cases were characterised by a young age at diagnosis and an even gender ratio. In two-thirds of the families, the cases were siblings or cousins.
We genotyped 1058 microsatellite markers with an average spacing of 3.5 cM, analysed the data using both non-parametric and parametric linkage analysis, and computed both two point and multi-point linkage statistics.
The strongest linkage finding was on chromosome 4p near the marker D4S394. The lod score calculated by Genehunter Plus was 2.6 (nominal p = 0.0002) when both HL and NHL individuals were considered affected. The mean identity by descent sharing among 35 affected sibling pairs was 72% in this region (nominal p = 0.00007).
The results are consistent with recessive inheritance. Other locations suggestive of linkage were found on chromosomes 2 and 11. The number of independent regions identified is more than expected by chance, although no one region met genomewide significance levels.
These linkage findings represent the first step towards identifying one or more loci leading to susceptibility to HL and understanding its complex aetiology.
DNA was extracted from cryopreserved lymphocytes using standard methods. Genotyping was conducted under contract to deCODE Genetics using their screening set of 1058 microsatellite markers containing markers from the ABI linkage marker (version 2) screening and intercalating sets, and 500 custom made markers with known allele size distributions. Marker positions were obtained from the deCODE genetic map.19 PCR reactions were set up in multiplex reactions with fluorescently labelled primer pairs selected to amplify highly informative two, three, and four microsatellite loci. Following PCR amplification, DNA samples were loaded into ABI 3730 capillary sequencers. In each 96 well DNA plate, 93 DNA samples and three CEPH controls (family 1347-2) were run. Alleles were automatically classified using deCODE Allele Caller software,20 which provides consistently >99.7% accuracy of genotyping calls compared with manual procedures. The samples were barcoded and tracked at each step, and profiled with sufficient markers for unique identification. Sample identities were checked for accuracy based on the pedigree structures in order to identify sample duplications and exchanges. The extensive use of robotics and automation at all steps in the process provided a high degree of reliability, and reduced sample handling errors. A multistage data analysis approach was used to minimise errors in genotyping. After initial genotype identification was made, analyses were conducted to detect non-Mendelian transmission of genotypes from parents to offspring. These errors were then checked by re-analysing the results from the ABI sequencer or by re-typing the samples.
Because the true genetic model for HL is not known, we computed power assuming a rare gene with either dominant or recessive inheritance and heterogeneity. We estimated the power to detect linkage using the program SLINK.21,22 We conducted simulations assuming both dominant and recessive inheritance models and penetrances of both 50% and 80% for the at risk genotypes and 0.1% for the normal genotype (a total of four models) Allele frequencies were set at values that kept the lifetime risk constant at 0.24 as estimated from SEER data.2 We assumed close linkage (θ = 0.001) of the disease locus to a marker locus with eight alleles, which is reasonable given the dense spacing and high information content of the real genotypes. We generated 200 replicates under each model to compute the average lod scores and power of detecting lod scores of 1–3.
The genotype data were stored in a Microsoft Access database. Formatting changes needed for specific programs were made using MEGA2.23 The genotype data were first checked for Mendelian consistency using the program PEDCHECK.24 The RECODE program was used to prepare the data files for analysis and to estimate allele frequencies from all founders in the pedigrees. We checked for the presence of additional genotype errors using the mistyping option of Simwalk2 (version 2.89),25–27 and eliminated genotypes that had probabilities of ⩾0.25 of being errors. In total, only a very small number of genotypes (<0.5%) was eliminated because of either Mendelian inconsistencies or high mistyping probability.
We first screened the 1058 markers using two point analyses with the MLINK program from the FASTLINK package.28–31 We calculated lod scores assuming the same models as described above for the power simulations. In addition, we assumed that penetrance increased with age, using age incidence rates in the population to construct liability classes. Multipoint analyses were conducted using Genehunter32 to compute both parametric lod scores, assuming heterogeneity (Hlod, with α = proportion of linked families) and non-parametric linkage (NPL) scores (z scores). Genehunter Plus lod scores were also calculated, because this method has been shown to give less conservative estimates of p values than does the original Genehunter method.33 As there were only a few NHL cases in this familial cohort, only individuals with HL were classified as affected for initial linkage analyses across the genome, and all other individuals were considered unaffected. Regions of the genome with nominal p values ⩽0.01 by any analysis method were followed up with additional analyses, including broadening the affection status to include NHL and calculation of mean IBD sharing in affected sibling pairs (ASPs) using the program Sibpal in SAGE (version 4.5).34
Power of linkage detection
The simulations showed that if ⩾75% of the families were linked to the same locus, then the probability of obtaining a lod score of 3.0 or more was at least 75%. The power to find a recessive gene was always higher than for a dominant gene; this is not surprising, as two thirds of the families have affected sibling or cousin configurations and are thus consistent with recessive inheritance. If only half of the families were linked to a single locus, then the power to detect linkage was modest (expected lod score = 2.3 under recessive inheritance, 1.9 under dominant inheritance). If only 25% of the families were linked, we would have minimal power to detect a susceptibility gene.
Two point lod scores revealed several regions of the genome with evidence for linkage to HL. The strongest findings were in regions on chromosomes 2, 3, and 4. Table 2 shows loci that had two point lod scores ⩾2.0 under any one of the four inheritance models tested. These regions, on chromosomes 2, 3, and 4, showed clusters of consecutive markers with positive scores. Another region on chromosome 4 and a region on chromosome 11 also showed clusters of consecutive loci with positive scores, with maximum scores between 1.5 and 2.0 (not shown). Multipoint NPL statistics of all of the chromosomes as calculated by Genehunter are shown in fig. 1. Parametric lod scores with and without heterogeneity were also calculated but are not shown. The densely spaced markers resulted in a high level of informativeness, with information content calculated by Genehunter averaging 0.75 throughout the genome. Fig 2 (A–E) shows regions on chromosomes 2, 4, 7, 11, and 17 where either multipoint NPL or Hlod scores had nominal p values <0.01. These figures show NPL and parametric statistics for both narrow (HL only) and broad (including NHL) affection status models. The parametric lod scores were calculated assuming heterogeneity using the inheritance model (dominant or recessive) that gave the highest two point lod scores. As seen in table 2, one marker on chromosome 3 had a two point lod score of 4.0 under dominant inheritance. The flanking markers had lods of 1.7 and 1.0 (not shown), but multipoint statistics in this region were substantially lower (highest NPL score was <2.0 and highest Hlod score was 1.4) than the two point results; thus no additional graphs are shown. Tables 3 and 4 summarise the linkage statistics, locations, and marker names for these six regions.
The strongest evidence for linkage occurred on chromosome 4 where the peak NPL or Hlod was found at 14 cM, flanked by markers D4S2935 and D4S394 (fig 2B). Strong evidence for linkage was seen under both affection status models. Under the broader model (in which NHL cases counted as affected), the peak NPL score was 2.9 and peak Genehunter Plus lod score was 2.6 (p = 0.0002), which is strongly suggestive of linkage (table 3). The locations of the peak linkage scores were consistent among all analyses performed. The Hlod score was highest under the recessive model and the proportion of families linked was estimated at 43%. Table 4 shows that the mean IBD sharing among ASPs as calculated by Sibpal (>70% and highly significant) gave results consistent with the other methods.
Fig 2 (A–E) and tables 3 and 4 show five other locations that had positive non-parametric linkage or Hlod scores. For each region, the positive findings from Genehunter were supported by increased mean IBD sharing among ASPs (table 4). The second most significant region was on chromosome 2 (Genehunter Plus lod score was 2.4, p = 0.0004), although the location varied from 41 to 62 cM (fig 2A). On chromosome 11, there was a peak at location 37 to 39 cM (fig 2D), depending on the model, which had a maximum Genehunter Plus lod of 2.2 (rounded from 2.18) and p = 0.0007. Regions on 4q, 7, and 17 also showed positive results. There was a second positive peak on chromosome 4 at location 173–176 cM (fig 2B), a peak on chromosome 7 (fig 2C), and one on chromosome 17, in which the peak was at the p telomeric region (fig 2E). There was also a modest signal (NPL score of ∼2.0) on chromosome 6 at the marker D6S1571, which is close to the HLA region (fig 1); however, no other markers on this chromosome gave positive results. This is consistent with a small HLA effect in HL and with previous data showing an association with HLA types within a subset of these families.18
Our unique sample of high risk HL families has allowed us to conduct a genomewide scan using a dense set of markers. The strongest finding in this study was on chromosome 4p. The inheritance is likely to be recessive, given that the highest Hlod scores were found assuming a recessive model and affected sibling pairs had a mean IBD >70%. The sharing in ASPs was highly significant although the sample size was not large. The likelihood of recessive inheritance is also supported by previous studies showing higher risks in siblings.4,15 The strength of linkage to this region increased slightly when the few individuals with NHL were considered affected. This is consistent with our population study showing that both HL and NHL were found more frequently in relatives of HL cases.15 In fact, the cases of NHL in our families had a much earlier age of onset (mean 49 years) than the population (median age of diagnosis is 67 years2). This location on chromosome 4p is a high priority for follow up with additional families. The regions on chromosomes 2 and 11, with Genehunter Plus lod scores of 2.4 and 2.2, are also strong candidates for containing HL susceptibility genes. The three other findings on chromosomes 4q, 7, and 17 have lower significance levels but warrant further follow up. The promising finding on chromosome 3 based on two point lod scores did not hold up under multipoint analyses. Consistent with other data, we also have evidence from this study that the HLA region may play a role in familial risk (fig 1). It is possible that applying more complex modelling of gene effects (such as multilocus models) to the genomewide linkage data would lead to more definitive detection of a susceptibility gene or genes.
There is some disagreement about how to interpret significance levels when conducting a genomewide scan.35 Wiltshire et al36 suggested an approach for evaluating the significance of linkage findings. They pointed out that for complex diseases, several genes may be involved and therefore it is less likely that any single region will reach a high level of significance. They propose counting the number of independent regions of linkage detected and comparing this to chance expectations. They pointed out that the lod score thresholds for “significant” (lod = 3.6) or “suggestive” (lod = 2.2) linkage, as defined by Lander and Kruglyak,37 are often too stringent, as the thresholds for any one study depend on sample size, marker density, and marker informativeness. Using simulations, Wiltshire et al36 found that for a 5 cM scan with 100 ASP families, a Genehunter Plus lod score of 1.78 would predict one linkage finding in a genomewide scan by chance and a more stringent lod score threshold of 2.2 would predict 0.37 linkage findings by chance. Our three strongest findings on chromosomes 4p, 2p, and 11p meet the more stringent threshold of 2.2 (table 3). Thus, we have identified more regions than expected by chance, which strongly suggests that there are one or more true loci causing susceptibility to HL among the locations identified.
There are several regions of the genome where recurrent cytogenetic changes are found in HL cells,38,39 including amplifications of regions of 2p, 4p, 4q, and 9p, and deletions of chromosome 6q25.40 The regions we identified by linkage do not appear to overlap with the cytogenetic regions. For example, duplication of the c-REL-BCL11A region on 2p was reported in HL cells, but these loci are about 30–50 cM centromeric from the peak we found. Similarly, amplifications on 4p may involve the fibroblast growth factor 3 gene,39 which is about 10 Mb from the peak we identified. The linkage peak on the telomere of chromosome 17 is near the p53 gene, but somatic mutations of this gene are not frequently found in HL cells,41 and one study found no germline mutations in familial HL cases from our sample.42 Locations determined from linkage analysis of complex diseases are imprecise owing to uncertainty about the underlying model, so it is possible that one or more of the regions identified by cytogenetic studies overlaps with regions we identified by linkage.
There are some limitations to this study. The highly selected families in our sample are not representative of HL in the population. Consistent with clinical descriptions of familial HL in the literature,43 referrals to our group are mostly families with cases that have early onset and NS subtype. There are a few later onset cases in these families, but even these are found within families having early onset in other members. In terms of histological subtype, we also find mixed cellularity cases in the same families with NS subtype. Thus, it is not possible for us to analyse linkage to subgroups based on age at onset or histological characteristics of the tumour. In addition, most of our families have HL in siblings and/or cousins, which makes it difficult to detect a dominant susceptibility locus. Future studies applied to more families and a broader representation of clinical types will lead to more robust conclusions about the effects of susceptibility genes, genetic heterogeneity of HL, and the range of phenotypic expression of specific susceptibility genes.
The findings presented here are the first step in the discovery of germ line susceptibility gene(s) and delineation of the pathways involved in development of HL. Even though these susceptibility loci are being discovered in high risk families, they may also play a role in development of sporadic HL. Defining these pathways and determining their interactions with environmental factors may lead to more effective treatment and prevention, which could have a great impact on patients, many of who are young and lose years of life/productivity to disease or treatment related morbidity.
We acknowledge the contributions of A Goldstein, L Harty, A Lin, J Whitehouse, and G Shaw for family ascertainment, and of E Jaffe for pathology review. The genotyping was conducted under NCI contract N02-CP-01108 with a subcontract BRC-1108-35 to deCODE Genetics. Some of the results of this paper were obtained by using the program package SAGE, which is supported by a US Public Health Service Resource Grant (RR03655) from the National Center for Research Resources.
Competing interests: none declared