Article Text

Download PDFPDF

A new method for autozygosity mapping using single nucleotide polymorphisms (SNPs) and ExcludeAR
  1. C G Woods1,
  2. E M Valente2,
  3. J Bond1,
  4. E Roberts1
  1. 1Molecular Medicine Unit, St James’s University Hospital, Leeds, UK
  2. 2IRCCS CSS, San Giovanni Rotondo and CSS Mendel, Rome, Italy
  1. Correspondence to:
 C G Woods
 Molecular Medicine Unit, Clinical Sciences Building, St James’s University Hospital, Beckett Street, Leeds LS9 7TF, UK;

Statistics from

The development of a silicon chip, such as the Affymetrix 10K Xba 131, bearing sufficient oligonucleotides to analyse 10 913 single nucleotide polymorphisms (SNPs) presents a new method for seeking autosomal recessive loci.1 This letter describes a practical strategy to analyse the data output of such an “SNP-chip” for this purpose.

Autozygosity mapping, first suggest by Lander and Botstein, is the method of choice for the discovery of autosomal recessive gene loci.2 The methodology seeks homozygous regions in consanguineous families. The greater the number of affected individuals who have a shared homozygous region and the greater the size of the region, the more likely it is to harbour the mutation that causes the disease. Mueller and Bishop modelled the use of a single multi-affected family and suggested that this was the most efficient strategy to determine a disease locus, particularly given the complexities of genetic heterogeneity.3 Autozygosity mapping became practical with the discovery of multiple highly polymorphic microsatellite repeat markers spread throughout the genome.4 Most researchers currently use optimised panels of approximately 400 markers for an initial genome-wide screen for linkage, giving 10–12 cM coverage of the autosomal genome; a process that has lead to the discovery of many recessive loci.5

The currently available SNP-chip detects SNPs spread throughout the genome (with the exception of the Y chromosome) and is analysed following a single hybridisation reaction with one individual’s genomic DNA. The results are produced as a simple spreadsheet of the SNP allele calls. Whilst each SNP has far less power to detect a homozygous chromosomal segment than a microsatellite marker, it is both their number (10 913 SNPs are equivalent to a 3–4 cM microsatellite marker map6) and their ability to detect a heterozygous region, and hence exclude linkage, that suggested their potential use in autozygosity mapping. An average microsatellite marker has a 70% chance of detecting a heterozygous region, but the approximately 30 SNPs within the same region have a >99% chance of detecting heterozygosity. (The chance that one of the 30 SNPs will be heterozygous is 1–0.730, which is >0.999, and on average nine of the 30 SNPs will be heterozygous.)

We have designed the following method to adapt Affymetrix SNP-chip output for autozygosity mapping.

  1. Only affected individuals from a multi-affected pedigree are analysed; in general the minimum sample analysis necessary is four from a single sibship or three from two or more sibships. Parents and unaffected siblings are not analysed.

  2. The primary results from each individual’s SNP-chip hybridisation are produced as a simple Excel spreadsheet. For each individual the results are then sorted by chromosome and genetic distance using a simple “data sort” Excel command. The column of sorted SNP allele calls is “copied.”

  3. The sorted data are processed using ExcludeAR, a freeware spreadsheet we created for this purpose. A separate ExcludeAR spreadsheet is available for the analysis of one, two, three, and four affected individuals (see Appendix 1).

  4. There are four versions of ExcludeAR (AR1–4) for the interpretation of data from one affected individual (AR1), two individuals (AR2), three (AR3), and four (AR4) individuals. The sorted primary SNP allele data from step 2 is pasted into ExcludeAR1 cell F:19 for one individual; for two individuals into ExcludeAR2 cells F:19 and G:19; for three individuals into ExcludeAR3 cells F:19, G:19, and H:19; or for four individuals into ExcludeAR4 cells F:19, G:19, H:19, and I:19.

  5. ExcludeAR first detects runs of consecutive homozygous SNP allele calls identical in all of the affected individuals analysed. It then determines if each run is of statistical significance (Table 1 and Appendix 2 explain how we derived this data). For instance, for a pair of individuals, say consanguineous cousins, a run of 12 or more homozygous SNPs in both would occur only once in 1000 analyses by chance rather than being identical by descent.

  6. ExcludeAR lists the 10 largest homozygous SNP runs by genetic size. For each result the following are given: genetic size, chromosome, genetic location on chromosome, number of homozygous SNPs in run, number of “NoCalls” in the run and whether the result reaches statistical significance. Two graphs are generated: the first shows the genetic size versus number of homozygous SNPs for the statistically significant results; the second shows the size of all statistically significant results by chromosome.

  7. The autosomal recessive disease gene sought could be located within any of the statistically significant homozygous segments detected.

  8. Table 1 gives an estimate of the minimum size of a homozygous region that could be detected using this method for different family structures. ExcludeAR also scans for potential homozygous deletions present in all affected individuals analysed (see Table 1 and Appendix 1).

  9. Any regions of statistically significant homozygosity may be further analysed by conventional polymorphic microsatellite analysis. The location of the SNPs is given by reference to the Human Genome Browser, and as a DeCode genetic distance, enabling the design or selection of suitable markers.7

Table 1

 The probability that a run of consecutive, concordant and homozygous SNP allele calls will occur by chance

Key points

  • Autosomal recessive disease gene loci can be found by analysis of affected members of consanguineous families using autozygosity mapping.

  • Autozygosity mapping can be performed using SNPs, particularly SNP-chips bearing thousands of SNPs.

  • We have devised a method to analyse the raw output of an Affymetrix 10K SNP-chip using a freely available spreadsheet, ExcludeAR.

  • ExcludeAR detects significant regions of homozygosity in one, two, three, or four affected family members.

The approach outlined here is undoubtedly a simplification. For instance, the possibility of genetic interference between neighbouring SNPs is ignored. It should, however, provide a practical method for analysis of SNP-chip output to detect significant homozygous segments, and hence locate recessive gene loci. ExcludeAR is free and available by download from


The ExcludeAR program was designed using the principle of the first Exclude program written by JH Edwards to detect linkage in autosomal dominant conditions; namely that after excluding regions of non-linkage any chromosomal regions that remain must contain the locus sought. This is achieved in ExcludeAR by the detection of heterozygous or homozygous but discordant results for each consecutive SNP. The SNPs that remain are homozygous and concordant, and when consecutive can be summed. The genetic distance from the start of the run to the end can be calculated. The minimum number of consecutive and concordant homozygous SNPs detected is set at 10 for analysis of one individual, but is reduced for versions of ExcludeAR analysing data from more than one person (see below). The results are ranked by genetic distance. The 10 largest regions of SNP homozygosity are shown together with the chromosome and number of SNPs involved. SNPs for which there are no results are scored as either AA, if no individual has a result, or as any other individual being analysed, i.e. “No Call”/AB, would be scored AB/AB. This permissive approach may lead to an overestimation of the number of homozygous SNPs in a run, but as >92% of SNPs are called per analysis the effect should be small. Furthermore, when the spreadsheet assesses the significance of a result it does so for the number of homozygous SNPs minus the number of “NoCalls”.

The ExcludeAR program is given in four versions, for the data from one, two, three, and four individuals. ExcludeAR2, for two individual analyses, is set to detect a minimum run size of 12 SNPs; ExcludeAR3 for seven SNPs; and ExcludeAR4 for four SNPs (see Tables 1 and 2). ExcludeAR will also alert to the possibility of homozygous deletions, detected as runs of “NoCall” in all individuals analysed. Graphs illustrate the major findings by SNP number and genetic size, and by genetic size on each chromosome.

Table 2

 The probability that consecutive SNPs would be homozygous and concordant by chance in one, two, three and four siblings


The probability that a number of consecutive SNPs would be concordant and homozygous by chance was assessed in a singleton, two, three and four siblings and a non-sibling pair, a sib pair and non-sibling, three non-siblings, and four non-siblings. The results are summarised in Table 1 and shown in full in Tables 2 and 3. They were generated by calculation of the first individual’s chance of being homozygous for an SNP, which is 0.70 (0.35 for AA and 0.35 for BB, our data from six consanguineous northern Pakistani individuals), multiplied by [10 913 – (n–1)] (where 10 913 is the total number of autosomal SNPs on the SNP-chip and n the number of SNPs in the run being analysed), multiplied by the chance that other individuals would be concordantly homozygous. This result for one SNP is then multiplied with the results for the consecutive SNP also being homozygous in an iterative manner until a probability of <1 in 1000 was achieved. The probability of 1 in 1000 was chosen because it is in common use as the LOD score of 3, which is regarded as significant when seeking conventional linkage.

Table 3

 The probability that consecutive SNPs would be homozygous and concordant by chance in families with two to four individuals in different siblings

The probability that a number of consecutive SNPs would be concordant, homozygous and identical by chance is given for the following: a singleton, two, three and four siblings and a non-sibling pair, a sib pair and non-sibling, three non-siblings, and four non-siblings. In each family situation the smallest number of consecutive homozygous SNPs to occur less commonly than 1 in 1000 by chance is shown in bold.


We thank Andrew Dearlove and Jo McBride at the MRC Geneservice for performing the SNP-chip analysis that led to this letter and Graham Taylor, Chris Inglehearn, Carmel Toomes, and Tim Bishop for helpful comments and discussions. We also thank the Wellcome Trust for funding this research.



  • Conflict of interest: none declared

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.