Main

Identification of chromosomal imbalances and variation in DNA copy-number is essential to our understanding of disease mechanisms and pathogenesis. Array CGH1 or matrix CGH2 offers the highest resolution for practical genome-wide detection of chromosomal alterations. This technique is derived from the concept of conventional CGH3, which has contributed greatly to the molecular characterization of both somatic and constitutional genomic DNA mutations over the last decade4,5,6. The primary limitation of conventional CGH is its resolution (20 Mb), as this method detects segmental copy-number changes on metaphase chromosomes3. In array CGH, the metaphase chromosome spread is replaced by BACs, PACs or YACs containing human DNA as targets, increasing the resolution to the distance between the selected marker DNA clones1,2. Genome screening using array CGH has great potential in the characterization of numerous chromosomal disorders.

Efforts to construct DNA arrays spanning the human genome consisted of spotting 2,460 (ref. 7) or 3,500 (ref. 8) marker BAC clones representing the sequenced genome at an average interval of 1 Mb. These studies showed that sufficient target-DNA printing solution could be generated from individual BACs using PCR-based protocols. Because the target product is PCR-derived, it is easily replenishable, obviating the need for multiple rounds of laborious large-scale BAC DNA preparations. These arrays are sensitive enough to detect single-copy changes, but the technique is limited by the small number of BAC markers representing the genome on the slide, rather than the methodology. Even at this resolution, array CGH is useful for detecting chromosomal aberrations associated with congenital abnormalities and somatic malignancies9,10,11,12.

Recent studies focused on higher-density regional arrays for fine mapping and identifying new genes in specific chromosomal regions13,14,15,16,17,18. For example, a candidate oncogene for association with breast cancer (CYP24A1) was identified on 20q13.2 using an array of 29 overlapping clones spanning this region18. The need for a tiling resolution array to map these amplification or deletion boundaries is indicated by the fact that two separate regions of amplification in 20q13.2 contained two separate putative oncogenes, which would not have been detected by a lower resolution array. These studies show that the resolving power of array CGH is maximized when the detection of single copy-number changes is combined with a tiling or overlapping set of BAC clones.

We created the first tiling resolution BAC array with complete coverage of the human genome using 32,433 fingerprint-verified individually amplified BAC clones. Here we show that such a complete genome comparison is capable of identifying microamplifications and microdeletions, which may contain genes involved in disease pathogenesis. We call this array submegabase resolution tiling set for array CGH (SMRT array).

Results

Array sensitivity

To assess the sensitivity of the SMRT array, we hybridized the well-characterized EBV-transformed lymphoma cell line TAT-1 (ref. 19) to normal male genomic DNA. Genomic regions containing BCL2 (18q21) and MYC (8q24) in TAT-1 were previously shown to have a twofold copy-number increase by FISH analysis19. We detected these previously reported amplifications at both loci, and we delineated their boundaries (Fig. 1). Boundaries of amplification on chromosome 8 were between BAC clone RP11-143H8 at 8q22.2 and RP11-263C20 at 8q24.13. Boundaries of amplification on chromosome 18 were between BAC clone RP11-159K14 at 18q21.32 and RP11-565D23 at 18q23. These data illustrate the detection sensitivity of array CGH.

Figure 1: Detection of twofold copy-number changes in TAT-1 lymphoma cell line on chromosome arms 8q and 18q.
figure 1

(a) Chromosome view of 8q showing MYC amplification between BAC clones RP11-143H8 and RP11-263C20. (b) Chromosome view of 18q showing BCL2 amplification between BAC clones RP11-159K14 and RP11-565D23. Vertical green and red lines are scale bars indicating log2 ratios of +0.5 and −0.5, respectively.

Array resolution compared to conventional CGH

To demonstrate the resolving power of the SMRT array, we compared the log2 ratio profile of lung cancer cell line H526 (refs. 20,21; Fig. 2a) to the previously published conventional chromosomal CGH data (see URLs). All patterns of gains and losses were matched, including large changes (e.g., the amplification of 7q and 8q and loss of the entire chromosome 10) and complex changes (e.g., the multiple amplifications on chromosome 1 and the multiple deletions on chromosome 4). Notably, conventional chromosomal CGH identified a highly amplified region on the telomeric end of chromosome arm 2p, apparently covering approximately one-fourth of the whole chromosome. However, the SMRT array analysis showed this amplification to be precisely localized to a 1.3-Mb fragment at 2p24.3, bordered by BAC clones RP11-351F4 and RP11-701O10, which contains the MYCN oncogene. The resolving power of this whole-genome array enables us to define breakpoints to within single BAC clones. For example, the deletion breakpoint on chromosome arm 3p was localized to between BAC clones RP11-632O5 and RP11-594F16 at 3p21.1 (Fig. 2b). This finding was subsequently confirmed by FISH analysis (Fig. 2c).

Figure 2: Whole-genome SMRT array CGH of lung cancer cell line H526.
figure 2

(a) Whole-genome view of H526 versus reference male DNA. (b) Amplified view of deletion breakpoint at 3p21.1 between BAC clones RP11-632O5 and RP11-594F16, also seen in a. Vertical green and red lines are scale bars indicating log2 ratios of +0.5 and −0.5, respectively. (c) FISH confirmation of breakpoint in b showing single-copy loss of BAC clone RP11-594F16 (green) and normal copy number of BAC clone RP11-632O05 (red).

Comparison to previous array CGH

To compare our tiling resolution array against current array CGH technology, we profiled colorectal cancer cell line COLO320 (ref. 22), which has been characterized in two previous array CGH studies7,23. We confirmed the amplification at 8q24 in the MYC region identified by these studies. Furthermore, the SMRT system further defined this segmental copy-number increase precisely to a 1.9-Mb region bordered by BAC clones RP11-810D23 and RP11-294P7 (Fig. 3).

Figure 3: Amplification of chromosome 8q24.
figure 3

12–13 in colorectal cancer cell line COLO320. This 1.9-Mb amplification containing MYC is bounded by BAC clones RP11-810D23 and RP11-294P7. Vertical green and red lines are scale bars indicating log2 ratios of +0.5 and −0.5, respectively.

A detailed analysis of our COLO320 profile identified new microamplifications on chromosome arms 13q, 15q, 16p and 22q (Supplementary Fig. 1 online), which were not detected by the two previous high-resolution CGH studies7,23. For example, we identified a 300-kb microamplification at 13q12.2 containing only three genes (according to University of California Santa Cruz Genome Browser, April 2003 Freeze): caudal type homeobox transcription factor 2 (CDX2), insulin promoter factor 1 (IPF1) and GS homeobox 1 (GSH1; Fig. 4a). CDX2 is a transcription factor expressed in the intestine and altered in colorectal cancers24. FISH analysis verified this microamplification and showed that it was within a homogeneously staining region (Fig. 4b). These findings illustrate the usefulness of a tiling resolution BAC array for comprehensive assessment of genomic integrity.

Figure 4: Identification of a new microamplification by tiling resolution array CGH in COLO320.
figure 4

(a) 300-kb microamplification on chromosome 13q12.2 containing genes GSH1, CDX2 and IPF1 and bounded by BAC clones RP11-153M24 and RP11-152N3. Vertical green and red lines are scale bars indicating log2 ratios of +0.5 and −0.5, respectively. (b) High copy-number amplification of RP11-153M24 detected by FISH hybridization. Amplification is located in a homogeneously staining region.

Identification of minute regions of alteration

In addition to microamplifications, we also detected small deletions in a number of tumor cell lines. For example, we detected a 1.25-Mb deletion containing the gene CDKN2A (also called p16) in lymphoma cell line Z138C at 9p21.3 (Fig. 5a). Deletion of CDKN2A occurs in approximately one-half of mantle cell lymphoma tumors as detected by FISH25. This deletion is bordered by RP11-328C2 and RP11-275H17 (Fig. 5a). Submegabase-sized microdeletions can be accurately mapped in a single whole-genome array CGH experiment. This is made possible by the overlapping clone coverage and their distribution on the array. A notable example is a 240-kb deletion at 7q22.3 in the breast cancer cell line BT474, containing PRKAR2B, a regulatory kinase, and HBP1, a G1 inhibitory kinase regulated by p38 MAP kinase26 (Fig. 5b). Such microdeletions have not been reported previously. The mechanisms by which such deletions are effected are not known. Whether this microdeletion affects the expression of PRKAR2B or the neighboring gene, PIK3CG, remains to be determined. The two experiments described here show how small, previously unidentified alterations that have the potential to contribute to disease may easily be identified in a single SMRT array experiment.

Figure 5: Identification of microdeletions.
figure 5

(a) Identification of a 1.25-Mb deletion at 9p21.3 in a mantle cell lymphoma cell line containing CDKN2A bounded by BAC clones RP11-328C2 and RP11-275H17. (b) 240-kb deletion at 7q22.3 in breast cancer cell line BT474 containing PRKAR2B and HBP1 bounded by BAC clones RP11-258L19 and RP11-262G16. Vertical green and red lines are scale bars indicating log2 ratios of +0.5 and −0.5, respectively.

Discussion

Array CGH is a proven method for accurate, robust and rapid genome-wide assessment of DNA copy-number variation. Current users of array CGH technology consider BAC DNA markers positioned at intervals of 1–2 Mb to be 'high-resolution' coverage. This view has been perpetuated by conventional whole-genome analysis tools, such as microsatellite marker analysis of loss of heterozygosity, in which small interspaced 'sequence-tagged sites' are assayed for genomic imbalance, and the genomic integrity between these sites must be inferred. In contrast, tiling resolution array CGH has the potential to identify minute genomic changes. In this study, we constructed a submegabase resolution tiling set for array CGH (SMRT array), comprising 32,433 overlapping BAC clones covering the entire human genome. This tiling resolution, combined with the proven sensitivity of array CGH, makes the technique ideal for identifying new genes and will prove useful for unraveling the genetic basis of numerous diseases.

Methods

BAC clone selection, preparation and validation.

Selection and the map positions of the 32,433 clones has been described previously and is available from The Children's Hospital Oakland Research Institute (see URLs). We validated clone identity by comparing HindIII fingerprints to the FPC BAC fingerprint database27 (see URLs). These clones provide 1.5-fold coverage of the human genome, giving an approximate resolution of 80 kb (i.e., two-thirds of an average BAC clone).

Array production from BAC DNA.

We prepared the DNA samples to be spotted on the array by PCR using linkers (primer sequences available on request). The protocol for linker-mediated PCR was previously described28. We precipitated the PCR products with ethanol and redissolved them in an MSP printing solution (Telechem), denatured them by boiling, and rearrayed them for robotic printing in triplicate using a VersArray ChipWriter Pro (BioRad). This arrayer uses a 12 × 4 array of SMP2.5 Stealth Micro Spotting Pins (Telechem/ArrayIT) depositing DNA spots of 0.8 nl at 1 μg μl−1 at 133-μm distances. We spotted the entire set of 32,433 solutions in triplicate onto two aldehyde-coated slides. Limited numbers of SMRT arrays are available on a cost recovery basis (see URLs).

DNA labeling and hybridization.

We labeled 400 ng of test and reference DNA separately with Cyanine-3 and Cyanine-5 dCTPs according to a random priming protocol previously described13. Before hybridization, we combined the DNA probes and purified them using ProbeQuant Sephadex G-50 Columns (Amersham) to remove unincorporated nucleotides. We then added 200 μg of human Cot-1 DNA (Invitrogen), precipitated the mixture and resuspended it in 100 μl of DIG Easy hybridization solution (Roche) containing sheared herring sperm DNA (Sigma-Aldrich) and yeast tRNA (Calbiochem). The probe was denatured at 85 °C for 10 min and repetitive sequences were blocked at 45 °C for 1 h before hybridization. We carried out prehybridization in the same buffer. We applied the probe mixture to the slide surface, fixed the coverslips and incubated them at 42 °C for 36 h. We washed the arrays five times for 5 min each in 0.1× saline sodium citrate, 0.1% SDS at room temperature with agitation. We then rinsed each array repeatedly in 0.1× saline sodium citrate and dried them by centrifugation.

Array imaging and analysis.

We imaged hybridized slides using a CCD-based imaging system (Arrayworx eAuto, Applied Precision) and analyzed them with SoftWoRx Tracker Spot Analysis software. We averaged the ratios of the triplicate spots and calculated standard deviations (s.d.). All spots with s.d. >0.075 or signal-to-noise ratios <20 were removed from the analysis. We used custom viewing software (SeeGH) to visualize all data as log2 ratio plots where each dot represents one BAC. This software is available on request (see URLs).

Reference male versus reference female hybridization detected no unexpected gains or losses, and random variability of log2 ratios were not observed (Supplementary Fig. 2 online). Furthermore, owing to overlapping clone coverage, a single clone with aberrant signal ratio would not be considered an amplification or deletion. Finally, since the clones are not spotted in the order of their map position, adjacent clones are distributed throughout our array.

URLs.

The H526 CGH profile is available at http://amba. charite.de/~ksch/cghdatabase/index.htm. The URL for Children's Hospital Oakland Research Institute is http://bacpac.chori.org/genomicRearrays.php. The FPC database is available at http://genome.wustl.edu/projects/human/ index.php?fpc=1. Whole-genome DNA arrays and SeeGH software is available at http://www.bccrc.ca/cg/arraycgh_group.html.

Note: Supplementary information is available on the Nature Genetics website.