Article Text
Statistics from Altmetric.com
- Pas, pulmonary adenoma susceptibility
- PCR, polymerase chain reaction
- QTL, quantitative trait loci
- RT-PCR, reverse transcriptase PCR
- SNPs, single nucleotide polymorphisms
- Scc1, susceptibility to colon cancer 1
Lung cancer is the leading cause of mortality from cancer in both men and women in developed countries. There is evidence that, although incidence is almost always associated with environmental factors such as smoking or occupational exposure, susceptibility has a genetic component with early onset lung cancer following Mendelian inheritance.1,2 Moreover, susceptibility is largely intrinsic to the lung itself as shown by the classical experiments involving lung explants from sensitive and resistant mice.3 Identification of the genes predisposing to cancer could yield targets for treatment or chemoprevention. In humans, the wide variety of carcinogens and varying degrees of exposure make identifying the predisposing genes difficult, but in a mouse model, such confounding variables can be controlled.
Key points
-
Past studies have mapped four susceptibility loci (Pas1–4) for pulmonary adenoma in which A/J and C57BL/6J (B6) mice have different alleles that affect incidence and multiplicity of tumours. With the release of a genome wide SNPs database, it has become feasible to analyse these genetically determined QTLs for genes polymorphic in these strains.
-
Celera’s discovery database (CDS 3.6, SNP 1.0) was scanned for SNPs in the Pas1–4 QTLs. SNPs were first screened according to the following criteria: (1) A/J and B6 were polymorphic for the SNP; (2) SNPs appeared in the coding region, the 5′ regulatory region, or the 3′ untranslated region; (3) SNPs appeared in known genes; (4) B6 and DBA/2J, phenotypically similar to B6, shared the same allele.
-
Genes for which associations or other plausible links with cancer have been published were deemed as final candidates. All 11 selected SNPs within candidate genes were verified by polymerase chain reaction (PCR) sequencing. We have also attempted to verify a series of differentially expressed candidate susceptibility genes to lung tumours in our previous microarray analysis with semiquantitative reverse transcriptase PCR (RT-PCR).
-
Differential expression of six of seven candidate genes were confirmed. Candidate Pas genes are Pas1, receptor type protein tyrosine phosphatase and basic helix-loop-helix B3; Pas2, Notch4, CREBL1; Pas3, Minpp1 and FoxD4; and Pas4, TGF β receptor II. Identification of the genes predisposing to mouse lung cancer could have considerable implications for diagnosis, treatment, or chemoprevention of lung cancer in humans.
Previously, classical genetic studies involving cross breeding of mouse strains with differing susceptibilities have identified chromosomal areas associated with predisposition to developing spontaneous and chemically induced lung adenomas.4,5 The process involves use of many evenly spaced polymorphic DNA markers to create landmarks across each chromosome. In the process called quantitative trait loci (QTL) mapping, attempts are made to find significant correlations between marker alleles and the phenotypic variation, or disease state.6 Inbred mouse strains vary in their susceptibility to cancer and two extreme strains are A/J (susceptible) and C57BL/6J (B6, resistant).4,5 Four QTLs identified as pulmonary adenoma susceptibility (Pas) loci 1–4 have been mapped, respectively, to mouse chromosomes 6, 17, 19, and 9. These regions are illustrated in fig 1.
Mouse pulmoary adenoma susceptibility loci (Pas) as mapped in crosses of strains A/J and C57BL/6. Diagrams show chromosomes drawn to a scale of physical length according to Celera CDS 3.6. Pas1 appears at the distal end of chromosome 6 with genetic position of the flanking markers at the left and the marker names and key genes at the right. Pas2–4, respectively on chromosomes 17, 19, and 9, are similarly labelled.
Much genetic variation between people is a result of random mutation at specific nucleotide positions. These single nucleotide polymorphisms (SNPs, pronounced snips) have the potential to produce profound effects on gene expression and consequently phenotype, as shown in fig 2. For example, SNPs in the 3′ UTR region can alter the stability of the mRNA, by changing binding sites or secondary structure, thus making it more or less likely to be degraded. A SNP in the 5′ region can change promoter binding sites and thereby modify the affinity for a transcription factor. Nonsense mutations are the most severe of the SNPs found in the coding region. Nonsense SNPs are ones that introduce a premature stop codon resulting in a truncated polypeptide, often resulting in loss of function. Missense mutations result in an amino acid change that can be important if the properties of the new amino acid (charge, polarity, etc) are different from the one it replaced. As susceptibility to lung cancer has a strong genetic component, detailed SNP analysis of polymorphisms between A/J and B6 (A-B SNPs) may facilitate the identification of candidate susceptibility genes. The genomes of both strains have been sequenced and assembled, and thus direct comparison is now possible.
Hypothetical examples of biologically effective single nucleotide polymorphisms (SNPs). Disease producing SNPs in the untranslated regions may influence mRNA stability in the case of 3′ UTR (top panel) or promoter activity in the 5′ UTR (second panel). Among the coding SNPs, missense SNPs (third panel) could cause an amino acid change that, for example, removes a disulphide bridge. Nonsense SNPs, by definition, introduce truncations that can diminish or obliterate protein function.
Celera’s mouse reference SNP database (1.0), the first such resource to become available, includes 2 566 706 SNPs and is based on the first Mus musculus assembly release which includes the publicly produced sequence of C57BL/6J and the Celera produced sequences of A/J, DBA/2J (DBA), 129X1/SvJ, and 129S1/SvImJ. The database was scanned for A-B SNPs in the Pas1–4 QTLs. The resulting list of SNPs was analysed to identify the most likely candidate genes. This SNP analysis complements previous gene expression analysis and, as in that work, builds on classic genetic results and integrates newly available genomic data.7
MATERIALS AND METHODS
Approach
Markers flanking each QTL were located in the Celera (Rockville, MD) mouse genome database (www.celera.com, CDS 3.6 release). These markers were used to identify the reference DNA positions within the assembled genome for our database query. The Celera mouse SNP reference database (v1.0) was queried for SNPs within each QTL then filtered to keep only A-B SNPs.
Data filtering
Additional filtering of the A-B SNPs was performed to increase confidence in positive results. Often with screening one seeks to reduce the false negative rate and thereby be inclusive, but owing to the tremendous number of SNPs (~20 000 in Pas1 alone), we sought to minimise the false positive rate. To accomplish this, A-B SNPs that met the following criteria were selected for further analysis: (1) it appeared in the coding region, the 5′ regulatory region, or the 3′ untranslated region; (2) it appeared in a known gene; (3) the DBA (another resistant strain) allele was the same as the B6 allele. Intronic, intergenic, and silent SNPs were excluded. This was done for two reasons: (1) the function of intronic and intergenic sequence is a matter of research beyond the scope of this study; (2) the number of these SNPs made their analysis currently untenable. For the resulting genes containing A-B SNPs, further analysis was performed to sequence verify the SNPs with PCR sequencing analysis, and to identify those genes or gene families in which association with lung cancer has been previously established.
SNP annotation
The SNP IDs were used to link SNPs with the genes in which they occur and Celera’s mechanism to link its genes with public sequences was used for reporting public accession numbers and descriptions from GenBank (RefSeq when possible).
Sequence verification of SNPs
A-B SNPs were sequence verified with RT-PCR sequencing by the sequencing core facility in the Division of Human Cancer Genetics, Ohio State University. Briefly, the transcript sequence for the candidate genes were downloaded from the Celera database and PCR primers flanking the SNPs were designed such that 200–400 bp fragments were produced. RT-PCR, as described later, was performed to amplify total RNA harvested from lungs of A/J and C57BL/6J mice. These amplified fragments were submitted to the sequencing core facility where they were processed according to standard procedures.
Confirmation of differential gene expression by RT-PCR
Expression of several candidate genes found here by SNP analysis, or previously by array analysis,7 were verified by RT-PCR. RNA was harvested from normal lungs of A/J, BALB/cJ, and C57BL/6J mice, reverse transcribed and amplified by PCR. A mixture of 4 μg of total RNA, oligo (dT) primer (Invitrogen, Grand Island, NY, USA), and double distilled water was incubated at 70°C for 10 minutes. This mixture was then added to the reverse transcription reaction mixture containing the following: forward reaction buffer, DTT (Invitrogen, Grand Island, NY, USA), RNasin (Promega, Madison, WI, USA), dNTPs, and M-MLV reverse transcriptase (Invitrogen, Grand Island, NY, USA). This final mixture was incubated at 37°C for one hour. Next, a final incubation at 95°C for two minutes was performed to remove RNA.
For real time PCR, a LightCycler-DNA Master SYBR Green I kit (Roche, Indianapolis, IN, USA) and an iCycler iQ (Biorad, Rodeo, CA, USA) thermocycler were used. Each reaction on a 96 well microtitre plate contained 2 μl of cDNA template and 2 μl of master mix. The kit master mix consisted of Taq DNA polymerase, reaction buffer, dNTP mix, SYBR Green I dye, and 10 mmol/l MgCl2. The final reaction had a 1 μmol/l concentration of forward and reverse primers and 2 mmol/l MgCl2 concentration, and a final volume of 20 μl. The PCR variables were as follows: a heating temperature of 95°C, an annealing temperature of 55°C, and an elongation temperature of 72°C were used for 34 cycles.
The genes examined were SmarcD3, ATFa, Cdk5, PDCD4, Cdc25A, and Mbd2 together with β actin as a control. Base 2 logarithm of fold changes were computed, for gene x between strains A and B, as log2(xA)-log2(β actinA)+log2(β actinB)-log2(xB).
RESULTS
Our approach to identifying candidate susceptibility genes for lung tumours with SNPs complements our earlier work with gene expression analysis.7 As in the previous study, this work builds on existing genetic mapping of QTLs, availability of the mouse genome sequence, and SNP data. The genome sequence, combined with previously identified QTLs, allows focus on SNP analysis within the regions known to modulate susceptibility and resistance.
Fig 1 illustrates the physical positions of the Pas QTLs, highlights the flanking markers and shows the corresponding genetic positions. As expected, proximal regions span less genetic distance than equal length distal regions, owing to the higher distal recombination rate. Table 1 shows the number of SNPs found in each QTL and the number of genes containing SNPs in each QTL for each step in the filtering process. For example, 19 092 SNPs were found in Pas1. After removing the silent, intronic, and intergenic SNPs and those not A-B polymorphic, the number was reduced to 99 A-B SNPs associated with 65 genes. After the final step of isolating only the A-B SNPs in known genes, 16 SNPs in 15 genes remained.
Summary of SNPs found in Celera Mouse SNP Reference database (v 1.0). Total refers to the number of all SNPs (any type, any strain-strain polymorphism) in the region. Filtered refers to the number of SNPs meeting the criteria described in the text. “No of genes” refers to the number of genes associated with the filtered SNPs. Known genes are any genes having specific functional annotation in the Celera CDS 3.6 database
Tables 2–5 enumerate, for each Pas QTL, the A-B SNPs found in known genes. Of the 15 A-B genes containing SNPs in Pas1, two stand out as strong candidates: Ptpro and Bhlhb3. In both genes, the B6 allele is shared with DBA and the sequence of the SNP for Ptpro has been verified. Ptpro is a receptor type O protein, tyrosine phosphatase. It has been suggested that other members of this family are tumour suppressor genes, including the recent suggestion that Ptprj (mouse chromosome 2) is the susceptibility gene to colon cancer 1 (Scc1).8Bhlhb3 contains a basic helix-loop-helix domain and is expressed in late embryogenesis and may be a target of Notch ligands.9,10
A-B SNPs in known Pas1 genes. SNPs appear in order of position on the chromosome. SNP ID is Celera’s unique ID number for the SNP. GenBank refers to the transcript sequence associated with the gene containing the SNP. Gene name is derived from Celera’s annotation wherever possible. Type refers to the gene structure containing the SNP and, for coding SNPs, it refers to the effect on amino acid sequence predicted by the most authoritative reading frame(s) for the transcript. Chr refers to the base pair position of the SNP on the chromosome according to the Celera CDS 3.6 assembly. mRNA refers to its nucleotide position within the corresponding Celera transcript. Allele/codon describes the nucleotide change, and, when it occurs within the translated region, it is described in terms of the codon triplet. C57BL/6J alleles always appear on the left, A/J alleles on the right. DBA refers to concordance of DBA/2J allele with B6 allele. RT is checked when the SNP was confirmed by RT-PCR sequencing. Array refers to whether the gene was found differentially regulated in a previous microarray study (Lemon WJ et al. J Med Genet 2002;39:644–55). Pb is checked when published reports have associated the gene with cancer. Description is derived from either Celera annotation or GenBank annoation, whichever the authors deemed more clear, with deference to Celera.
A-B SNPs in Pas2 known genes. Columns as described for table 2
A-B SNPs in Pas3 known genes. Columns as described for table 2
A-B SNPs in Pas4 known genes. Columns are as described for table 2
Pas2
In this locus, often noted for the preponderance of HLA genes, two non-HLA genes stand out. Notch4, as a known oncogene,11 and CrebL1, as a member of the cyclic AMP response element binding proteins, could contribute to metastatic potential of melanoma cells.12,13 The A/J allele for the Notch4 SNP was not reported in the Celera database, but sequencing showed it to vary between A/J and B6. Both seem to have a functional relation to tumorigenesis and the B6 allele is shared with DBA.
Pas3
Two genes were found to be candidates, Minpp1 and FoxD4. The sequences of SNPs for both genes were verified. Forkhead box D4 is the best candidate, having one B6 allele shared with DBA and being a member of a family of transcription factors that regulate the cell cycle, and it may play a part in tumorigenesis.14
Pas4
TGFβRII has historically been the primary candidate for Pas4. Both B6 alleles are shared with DBA, but the A/J allele is not reported in the Celera database. As an SNP was found in Celera between B6 and 129X1, we checked the region for a SNP between A/J and B6, but found none. It has been shown that loss of the type II receptor is often associated with a loss of TGF-β induced cell cycle repression.15 Reduction of type II receptor mRNA is strongly associated with lung adenocarcinoma in mice and humans.15,16 However, RT-PCR sequencing showed no SNP between A/J and B6.
RT-PCR verification of microarray data
Fig 3 shows RT-PCR follow up on selected genes found to be candidates in our previous microarray study.7 In that study, RNAs from normal mouse tissue from these strains were evaluated for differential expression across strains. Genes within the QTLs that are differentially expressed across strains were deemed as candidates on the basis that expression differences could predispose to tumorigenesis. However, the microarray data were not confirmed by conventional RT-PCR or northern blotting techniques. In this study, we selected seven of the differentially expressed candidate genes found in that study for further confirmation. Six of the seven genes examined by RT-PCR showed differential expression similar to that found with the array, as shown in fig 3. One gene, ATFa, did not show the same pattern with RT-PCR and array. Note that this gene had the lowest level of expression of the seven genes (Li-Wong full model estimates of gene expression: B6=27, A/J=58). Fig 3 shows the correspondence of differential expression measured in the array data with that found by RT-PCR and correspondence between the two assay systems is consistent for six of seven candidate genes assayed.
Comparison of microarray data with RT-PCR for selected genes. Fold change of gene expression between A/J and C57BL/6J for five genes and between A/J and BALB/cJ for one gene (Mbd2) are compared on a log2 basis. Each unit on the y axis represents a twofold change. Negative values represent higher expression in B6.
DISCUSSION
Following the commonly held and straightforward view that genetic determinants of susceptibility may arise from the simplest form of genetic variation, the SNP, we analysed SNPs in four susceptibility loci to pulmonary adenoma. Five strains have been sequenced and assembled to date: A/J (susceptible, s), DBA/2J (resistant, r), C57BL/6J (r), and two substrains of 129, namely 129X1/SvImJ (intermediate, i) and 129S1/SvJ (i).5,17 Substrain 129X1/SvJ was contaminated in 1978 and, as a result, contains as much as 25% non-129 DNA. Owing to consequent large scale heterozygosity, it is considered by some as not an inbred strain.18–20 Manenti et al17 reported 129X1/SvJ to be of intermediate susceptibility which leaves the expectation for Pas genotypes ambiguous. Similarly strain 129S1, the “steel” strain, is an inbred strain, but being of intermediate susceptibility, the expectation for Pas genotypes is also ambiguous. Consequently, our criteria required that DBA and B6 share alleles and that A/J be different, but the 129 alleles were ignored. Analysis was limited to SNPs likely to be detectable in mRNA, namely, coding SNPs, splice site SNPs, and UTR SNPs. As K-ras, which has been a strong Pas1 candidate gene for some time, has polymorphisms between A/J and B6 in intron 2,5 we would have liked to include intronic SNPs. However, so many intronic SNPs were identified that it was deemed at this point untenable to sort out the useful ones.
Pas1 has been shown to be the major locus, accounting for 45% of the phenotypic variance.4 Mutant K-ras is found in 80% or more of mouse lung adenomas and adenocarcinomas with mutations of codons 12 and 61 reported in both susceptible and resistant strains.21,22 In hybrid animals, mutation occurs at a much higher frequency in susceptible alleles than in resistant alleles.23 A bioassay in the same study showed that introduction of the mutant allele, derived from either the susceptible or resistant parent, was sufficient to transform NIH 3T3 cells. Recent studies have confirmed the strong association of the locus with cancer in mice and have shown that the homologous human locus, 12p12, is a major susceptibility locus in Italian and Japanese people.17,24–26 The entire region between K-ras and ITPR2 is homologous between mouse chromosome 6 and human chromosome 12, and human K-ras has been shown to share mutations associated with cancer at codons 12 and 61.27 Some questions have remained. By what mechanism is the A/J allele of K-ras susceptible? Is there another gene in Pas1 that confers the propensity to mutation? Perhaps is it the Par2 gene on chromosome 18, which has been shown to modulate the effect of Pas1? According to the present SNP analysis of Pas1, the protein tyrosine phosphatase and the basic helix-loop-helix B3 genes are also candidates. Additional functional analysis of the allelic differences in cell transformation and carcinogenesis should provide more definitive answers.
Pas2 QTL is located at the H-2 locus the haplotypes of which correlate with the incidence and multiplicity of induction of mouse lung tumour. The primary candidates for this region are Notch4 and Crebl1. Notch4 has been shown to be susceptible to viral activation, which can transform cells and drive them to a highly invasive phenotype.11 This gene was also a candidate in previous expression studies. Crebl1, a SNPs binding protein, has been implicated in transcriptional activation and may contribute to the acquisition of metastatic phenotypes.13
Pas 3 was first described by Devereux et al28 as flanked by D19MIT42 and D19MIT19. They used linkage analysis of N-ethyl-N-nitrosourea treated (A/J × C57BL/6J) Fl × C57BL/6J backcross progeny. This finding was later confirmed by Festing et al29 with urethane treated (A/J × C57BL/6J) F2 mice. The main candidate in the Pas3 region is Foxd4. Foxd4 is a forkhead transcription factor. Forkhead transcription factors are involved in regulating cell cycle progression and cell death. Long term Forkhead activation causes a sustained but reversible inhibition of proliferation without a marked increase in apoptosis.14
Finally, Pas4 was mapped by Festing et al.29 One of the candidates is TGFβRII, the expression of which is reduced in urethane induced pulmonary adenoma in A/J mice. Loss of type 2 receptor results in a loss of induction of apoptosis.30 Loss of sensitivity to TGFβ is a hallmark of invasiveness in some cancers.31 However, no SNP between A/J and B6 was found, suggesting that the basis for its candidacy as Pas4 may be intronic or intergenic.
In this study, we also used RT-PCR to corroborate the results for six of seven candidate genes identified by allelic specific differential expression in our earlier study (fig 3).7 Other work by us with others showed similar corroboration between RT-PCR and oligonucleotide arrays.32,33 This strengthens the case for these genes potentially to play a part in predisposition of lung tumorigenesis in mice. However, the differences between these assays in reported fold change suggests that array results should continue to be taken cautiously and RT-PCR validation of the most important results should continue. Replication of arrays on multiple animals is also recommended. The study on gene expression complements the present study on SNPs in two ways. Firstly, differences in gene expression may not be the result of SNPs, so the gene expression based candidates bring in new information. Secondly, SNPs can have full effect without any change in gene expression whatsoever. So the sets of candidates are not mutually exclusive, nor are they subsets. Instead they are complementary.
In summary, we have described several candidates for the Pas and Par QTLs using the SNP analysis and microarrays. These candidates should accelerate identification of susceptibility genes to mouse lung tumour and potential functional interactions among different Pas and Par QTLs. This and our previous study show the synergies between classical genetic studies and genomic tools such as the SNP database and microarrays in dissecting the genetic basis of susceptibility to lung tumour in mice.
Acknowledgments
We are grateful to G Stoner for critical reading of this manuscript and helpful discussions. This work was supported by NIH grants R01CA58554 (MY), R01CA78797 (YW), and P30CA16058.