Finding genes underlying risk of complex disease by linkage disequilibrium mapping
Introduction
Health problems that appear to aggregate within families but that do not segregate like a simple Mendelian gene pose a special problem for researchers trying to either predict the risk of the disorder or to identify relevant genes for understanding etiology. Traditionally, linkage methods have served extremely well for Mendelian disorders, and the same approach of fitting linkage models to pedigree-structured data in which phenotypes and marker genotypes are scored has been reasonably effective for complex traits as well. The problem is that pedigree methods suffer from the fact that the resolution of the mapping depends on both sample size and marker density, and even the largest studies typically have a rather poor resolution of 5–10 cM. More recently it was discovered that one can apply similar analysis to affected sib pairs, noting that sharing of marker alleles and of phenotypes is more likely when the marker is closely linked to segregating variation that causes trait variation. Although affected sib pair (ASP) methods have the big advantage that much larger samples can be obtained, they lack the advantage of acquiring information about linkage phase that a multigeneration pedigree provides, and so in the end the resolution of ASP methods also are less than optimal. The final assessment of the efficacy of using whole genome linkage disequilibrium (LD) scans to find genes associated with risk of complex disease will have to wait until the approach is actually tried. In the meantime, I here discuss recent work that has been done seeking to improve the chances for success of the method by characterizing and analyzing the haplotype structure of human genetic variation.
Section snippets
Directly testing disease association
Risch and Merikangas [1] made the observation that an outbreeding population has some properties like a large extended family — namely there are many meioses in which the association between a marker and a disease-associated allele can recombine. But if the marker and disease-associated alleles are found to be in tight statistical association, this may amount to evidence that they are closely linked. There is a large body of theory behind this notion, and the theory describes many factors that
Linkage disequilibrium across the genome
To assess the overall efficacy and cost of LD mapping, we first need to determine the distribution of spans of the human genome that exhibit LD. This had been done for several human genes by resequencing to obtain many single nucleotide polymorphisms (SNPs) in the same gene 4., 5., 6., 7.. Figure 2 shows the relationship between physical separation between SNPs and two common metrics for LD. Other metrics for LD are evaluated by Devlin and Risch [8]. One problem with this approach is that from
Population subdivision, demography and linkage disequilibrium
There is a long history of interest in inferring the degree of population subdivision from genetic data, and application of this analysis to human genetic markers reveals that ∼8–10% of the genetic variance is found within population groups 12., 13.. When making inferences of association between genes and complex diseases, the need to understand population subdivision is critically important. If one does a case-control study, and the samples under study are a mix of two somewhat isolated
Complex disorders are not simple
Even if LD mapping of single genes were simple, mapping complex traits is an enormous challenge for the same reasons that it is so difficult to draw firm conclusions from epidemiological data. Genetic variation is likely to contribute to overall risk of many complex diseases, but the genetic component may be small compared to some environmental insults, and the fact that genes and environment interact, and that health is something that is deeply context dependent (CF Sing, JH Stengård, SLR
Models for the genetics of complex disorders
There is a long history in genetic analysis that points to the power of a good model. If we formulate a scheme whereby genes affect a trait, we are much more able to test and either reject or accept the model, compared to a more open-ended situation. Key parameters in whole-genome association testing are the number of genes that are having a causal effect on risk, the frequency of the variant alleles, and the magnitudes of effect of those alleles on risk. Before we consider the complexities of
Selection in the human genome
A factor that inflates LD in the human genome more directly and strongly than any other is natural selection. This is especially evident in cases where a single gene has an influence on the risk from a disease, such as the improved resistance to Vivax malaria by people with the Duffy null allele [34] or increased resistance to Plasmodium malaria in individuals with the low-activity alleles of G6PD [35]. The recent generation of near genome-wide datasets on SNP genotypes has opened the
Why HapMap?
The NIH Haplotype Map (HapMap) project is the largest single project in human population genetics ever attempted, and as a result it has received some harsh criticism. As of writing, the exact scope of the project is unclear, but it will entail a large quantity of SNP genotyping in several human population groups. Given that the project will be completed and the genotype data will be collected, the constructive challenge we face is to formulate the best questions and the best use of the
Conclusions
The potential for a disease to be determined by a vast array of extremely rare alleles in many different genes embedded in a network of highly epistatic genes with strong context-dependent environmental effects makes it possible to imagine that some diseases may have a genetic component but be truly unyielding by the proposed methods. But even in this worst-case scenario, we already know that not all complex diseases are this ill behaved, so the problem can be restated as finding efficient
References and recommended reading
Papers of particular interest, published within the annual period of review, have been highlighted as:
- •
of special interest
- ••
of outstanding interest
Acknowledgements
This work was supported by grant HG02352 from the United States National Institutes of Health.
References (57)
- et al.
Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase
Am. J. Hum. Genet.
(1998) - et al.
Apolipoprotein E variation at the sequence haplotype level: implications for the origin and maintenance of a major human polymorphism
Am. J. Hum. Genet.
(2000) - et al.
A comparison of linkage disequilibrium measures for fine-scale mapping
Genomics
(1995) - et al.
Haplotypes and linkage disequilibrium at the phenylalanine hydroxylase locus, PAH, in a global representation of populations
Am. J. Hum. Genet.
(2000) - et al.
Sequence variations in the public human genome data reflect a bottlenecked population history
Proc. Natl. Acad Sci. USA
(2003) - et al.
Understanding quantitative genetic variation
Nat. Rev. Genet.
(2002) - et al.
Linkage disequilibrium: what history has to tell us
Trends Genet.
(2002) - et al.
On the allelic spectrum of human disease
Trends Genet.
(2001) Are rare variants responsible for susceptibility to complex disease?
Am. J. Hum. Genet.
(2001)- et al.
The allelic architecture of human disease genes: common disease — common variant… or not?
Hum. Mol. Genet.
(2002)
Optimality, mutation and the evolution of ageing
Nature
Complex signatures of natural selection at the Duffy blood group locus
Am. J. Hum. Genet.
Statistical tests of neutrality of mutations
Genetics
Inference of haplotypes from PCR-amplified samples of diploid populations
Mol. Biol. Evol.
High-throughput screening for evidence of association by using mass spectrometry genotyping on DNA pools
Proc. Natl. Acad Sci. USA
The future of genetic studies of complex human diseases
Science
Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland
Nat. Genet.
Maximum-likelihood estimation of gene location by linkage disequilibrium
Am. J. Hum. Genet.
Patterns of linkage disequilibrium in the human genome
Nat. Rev. Genet.
DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene
Nat. Genet.
Linkage disequilibrium in the human genome
Nature
The structure of haplotype blocks in the human genome
Science
A first-generation linkage disequilibrium map of human chromosome 22
Nature
An apportionment of human DNA diversity
Proc. Natl. Acad Sci. USA
Patterns of human diversity, within and among continents, inferred from biallelic DNA polymorphisms
Genome Res.
Linkage disequilibrium and the search for complex disease genes
Genome Res.
Case-control studies of association in structured or admixed populations
Theor. Popul. Biol.
Linkage disequilibrium in humans: models and data
Am. J. Hum. Genet.
Cited by (70)
Overview of Genotyping
2012, Molecular Analysis and Genome Discovery: Second EditionGenetic linkage studies
2011, An Introduction to Genetic EpidemiologyFeature selection for single nucleotide polymorphisms based on muti-group genetic algorithm
2010, Sichuan Daxue Xuebao (Gongcheng Kexue Ban)/Journal of Sichuan University (Engineering Science Edition)Methodological challenges of genome-wide association analysis in Africa
2010, Nature Reviews Genetics