Article Text
Statistics from Altmetric.com
Autosomal recessive disorders are an important cause of childhood morbidity and mortality, and may reach significant frequencies in specific ethnic groups.1 The affected progeny of consanguineous parents provide an opportunity to undertake gene mapping and positional-candidate gene analysis,2 since it is highly likely that the disease locus is identical-by-descent from a common ancestor. The strategy of searching for regions of homozygosity in affected individuals from consanguineous families, using the methodology of autozygosity mapping, has proven to be highly effective for mapping loci and identifying autosomal recessive genes.3 The identification of recessive disease genes enables diagnostic and carrier testing and can provide critical insights into the pathogenesis of the disease.
Most researchers who perform autozygosity mapping currently use panels of approximately 400 highly polymorphic microsatellite markers in an initial genome-wide linkage screen, to give a marker spacing of between 10 and 12 cM throughout the autosomal genome. The accurate and reliable genotyping of genetic markers is, of course, essential for the success of such mapping projects. However, we have found the subsequent collation of the large datasets generated in mapping projects to be both time consuming and prone to error if performed manually (by using, for example, “cut and paste” into a spreadsheet). We have therefore designed a simple but versatile Microsoft Excel spreadsheet which automatically collates and analyses genotyping data, and enables shared regions of homozygosity to be identified quickly following visual inspection of the collated data. The expected average length of autozygous regions, that are identical-by-descent around a disease locus, have been calculated by Génin et al to be 28 cM for affected individuals from a first cousin mating, and 22 cM from a second cousin mating.4 In our experience, these values are good approximations to the length of homozygous regions found in real genome-wide linkage screens (fig 1), and, therefore, a marker spacing of 10 cM is adequate for a cohort of affected individuals from a first cousin mating.
A representative section of the “genotypes” spreadsheet that covers the genotyping data for chromosome 1 markers for samples 1–6. The first six columns list the marker locus, the chromosome number (in this case, chromosome 1), the panel number in the ABI PRISM Linkage Mapping Set v2.5 “MD10” configuration, and the genetic or physical locations of the marker. Columns seven and eight in grey summarise the number of “HOM” and “FAIL” calls in the dataset (in this case, samples 1–8). SCAMP detects an interesting region of shared homozygosity, highlighted in darker grey in column seven, between markers D1S199 and D1S230. The remaining columns display genotyping data for samples 1–6 taken from the “raw data” spreadsheet, and a summary column to make a “HOM” call if both alleles are homozygous and a “FAIL” call if the genotyping has failed.
We have called our spreadsheet “SCAMP” for spreadsheet to collate autozygosity mapping projects. The spreadsheet is available as a freeware download from the Medical and Molecular Genetics website at http://www.rch.bham.ac.uk/MMG/SCAMP.htm in the form of a Microsoft Excel workbook compressed as a.zip file or can be obtained from http://jmg.bmjjournals.com/supplemental/ as either a.zip or an.xls file. We have successfully used SCAMP in several autozygosity mapping projects5–7 for which it provided a simple and practical solution for the analysis of genotyping data and facilitated the subsequent identification of disease loci and genes.
The key features of SCAMP are described in the following paragraphs.
I. SCAMP is based on the popular ABI PRISM Linkage Mapping Set v2.5, and contains the details of all 811 markers in panels 1–28 for the “MD10” configuration (coverage at 10 cM resolution) and the markers in panels 29–86 for the “HD5” configuration (∼5 cM resolution). Marker details are collated in the “ABI mapping panels v2.5” spreadsheet and include panel number, chromosome number, locus and primer identifiers, dye colour, and heterozygosity value. In addition, each marker is annotated with sex-average genetic distances, based on the Marshfield and deCODE genetic maps,8,9 and the physical distance from the deCODE genetic map. The order of the markers has been sorted by using the Excel “data sort” command, on the basis of chromosome number, followed by the Marshfield genetic distance. The values of genetic distance and physical location are linked to the separate spreadsheets “Marshfield markers” and “deCODE markers” using the data lookup function VLOOKUP. If required, other marker sets could be included in additional spreadsheets, although the data range for the VLOOKUP functions must be changed. The syntax for data lookup functions is easily accessible using the help files in Microsoft Excel.
Key points
-
Autosomal recessive disorders are an important cause of childhood morbidity and mortality, and may reach significant frequencies in specific ethnic groups.
-
Identification of recessive disease genes enables diagnostic and carrier testing and can provide critical insights into the pathogenesis of the disease
-
We have designed a simple but versatile spreadsheet which automatically collates and analyses genotyping data, and enables shared regions of homozygosity to be identified quickly following visual inspection of the collated data. The spreadsheet has been called SCAMP.
-
SCAMP provides a simple and practical solution for the analysis of genotyping data and facilitates the subsequent identification of disease loci and genes.
II. The “genotypes” spreadsheet contains only the 400 ABI PRISM “MD10” markers which would be usual for a general genome-wide linkage screen. The markers are sorted on the basis of chromosome number and genetic distance, and the values for their genetic distances and physical locations are linked to the “ABI mapping panels v2.5” spreadsheet by VLOOKUP, although other spreadsheets containing marker details could be used. Figure 1 shows a small section of the “genotypes” spreadsheet that covers the genotyping data for chromosome 1 markers for samples 1–6.
III. Genotyping data is entered into the “raw data” spreadsheet. This is most easily done by saving the output from, for example, ABI PRISM “Genotyper” software as a tab-delimited text file that can be imported into the “raw data” spreadsheet in the form of four continuous columns of data. Column 1 contains the marker name (which must exactly match the name in other spreadsheets), column 2 contains the sample number (in this example 1–16; see paragraph V below), and columns 3 and 4 contain the genotyping data. Column 5 contains the ROW function to return the number of the row, which is used by the data lookup functions in “genotypes”.
IV. The “genotypes” spreadsheet then uses a combination of the INDEX and VLOOKUP functions to look up genotyping data in the “raw data” spreadsheet, and to place it in the correct position in “genotypes”. The syntax for these functions is available from the help files in Microsoft Excel. Genotype data takes the form of two columns, with a third adjacent column to call if the alleles are either homozygous (an IF function returns a “HOM” value) or if the genotyping has failed (a “FAIL” value is returned).
V. In the freeware download, available from the University of Birmingham website, we have included typical results for two separate cohorts, each of eight affected individuals. In this example, the two sets of eight individuals are analysed separately, with the data derived from 16 rows of genotyping data for each marker in the “raw data” spreadsheet. The parameters of the lookup functions can be easily changed for fewer or greater number of individuals, and the existing structure of the spreadsheets can be used as a template for a customised analysis.
VI. Conditional formatting is used to colour in “HOM” values for samples 1–8 for the first cohort in pale green, and for samples 9–16 for the second cohort in pale cyan. (The colours refer to the formatting in the electronic form of the spreadsheet, but are rendered in shades of grey in the representative section shown in fig 1.) Summary columns on the left of samples 1–8 and samples 9–16 are coloured in grey. These use the COUNTIF function to count the number of “HOM” and “FAIL” calls for each cohort. For samples 1–8, an interesting region of shared homozygosity is detected on chromosome 1, with conditional formatting colouring in COUNTIF values of ⩾5 in green. For samples 9–16, an interesting region is highlighted in cyan on chromosome 5. Conditional formatting also colours values of ⩾2 in red for the COUNTIF function on the number of “FAIL” calls in each cohort. This shows that genotyping needs to be completed on chromosomes 17–21 for both datasets. The conditional formatting values of ⩾5 out of 8 for “HOM” calls and ⩾2 out of 8 for “FAIL” calls can be easily changed to suit the requirements of the mapping project, although in this example they are designed for two cohorts of eight individuals each.
REFERENCES
Supplementary materials
Web-only Appendix
The spreadsheet to collate autozygosity mapping projects is available here as an Excel file.
Files in this Data Supplement:
Footnotes
-
Conflict of interest: none declared.