An expectation maximisation based prediction algorithm was created to identify unusual haplotypes in patient samples that may be caused by small intragenic deletions. In this approach, unphased SNP genotypes are compared to pairs of canonical haplotypes to identify potentially hemizygous regions. This method was successfully applied to identify five deletions in the 3′ region of BRCA1.
- gene rearrangement
- single nucleotide polymorphism
Statistics from Altmetric.com
Most of the human SNP diversity at a given locus may be described as a set of “canonical” haplotypes, representing common haplotypes in a population. Of the few unusual haplotypes not in the canonical set, some are found in a genotype context that is similar to a genotype expected from a combination of a pair of canonical haplotypes, except that one or several polymorphic positions appear, unexpectedly, homozygous. These unusual haplotypes could represent hemizygous loci resulting from intragenic deletions. However, these could also be bona fide haplotypes that are too rare to be represented in the canonical set, resulting from recombinations between common haplotypes, or single base changes.
In the present study, we used unphased genotype data for 12 common biallelic BRCA1 SNPs (located from exons 4 to 16) generated during sequence based clinical mutation testing to obtain BRCA1 SNP haplotypes for 5911 anonymised samples, by applying an expectation maximisation (EM) algorithm similar to those described elsewhere.1,2 Ten canonical SNP haplotypes are known for BRCA1.3 Interestingly, two BRCA1 haplotypes account for the bulk of the genotypes, the consensus at 59% frequency, and the most common non-consensus haplotype at 21% (fig 1).
We identified 78 (1.3%) samples with non-canonical haplotypes, which in 62 cases were paired with a canonical haplotype. Among these samples, 14 were identified with rare haplotypes created by a change at the SNP in exon 16 (fig 1), suggestive of a possible deletion of this exon. This group was selected for molecular analysis because of its abundance, representing 18% of the unusual haplotypes. Also, additional sequence information available for one of these samples showed a heterozygous SNP near exon 17, placing a 3′ limit on the size of the putative deletion. This discovery increased the likelihood that the unusual haplotype in this sample, and perhaps others in this group, were the result of intragenic deletions, and not just recombination.
Within the remaining samples, 42 contained rare haplotypes that appeared to arise from changes in one out of five SNP loci in exon 11, potentially indicative of a partial deletion of the exon. These samples were excluded from deletion testing because, to date, all clinically significant large deletions that have been characterised in BRCA1 are Alu mediated, whole exon deletions.4–8 An additional 17 samples were excluded from deletion testing because their haplotypes were defined by changes at two, non-adjacent haplotype defining SNPs, which could not be explained by a single deletion event. Several other samples were identified with haplotypes that may have been the result of intragenic deletions. However, the DNA from these samples was not available.
Long range PCR of the region between exons 14 and 18 showed that seven of the selected 14 samples produced smaller than the expected products. Restriction digests of the mutant fragments localised breakpoint regions, which were then characterised by sequencing (fig 2) to show five distinct novel deletions, ranging in size from 5629 to 7183 bases. All five deletions appear to be Alu mediated. Additional work is continuing to determine if the remaining seven samples may contain deletions with breakpoints outside the amplified region.
Our example shows that haplotype analysis is effective for identifying samples that may contain hemizygous regions. The use of haplotype analysis permitted rapid processing of a large sample set and selected the best candidate samples for biochemical analysis. Of the selected specimens in our target group, 50% were shown to contain deletions. Also, this example illustrates the potential increased information available from phased data that can be overlooked in unphased genotype data. Finally, this method may have broader application in large SNP based assays by identifying potentially hemizygous data that might confound downstream analyses.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.