Finding genes underlying risk of complex disease by linkage disequilibrium mapping

https://doi.org/10.1016/S0959-437X(03)00056-XGet rights and content

Abstract

Identification of genes that harbor variation associated with inter-individual differences in risk of complex diseases remains one of the most challenging and important problems in human genetics. For genetic variants that are sufficiently common and have sufficiently large effects, direct tests of association through linkage disequilibrium with anonymous SNPs may prove effective. But the two critical parameters — the frequency of risk-inflating alleles and the magnitudes of their effect on risk — remain largely unknown. In this review we consider the latest information regarding the likely efficacy of the linkage disequilibrium mapping approach.

Introduction

Health problems that appear to aggregate within families but that do not segregate like a simple Mendelian gene pose a special problem for researchers trying to either predict the risk of the disorder or to identify relevant genes for understanding etiology. Traditionally, linkage methods have served extremely well for Mendelian disorders, and the same approach of fitting linkage models to pedigree-structured data in which phenotypes and marker genotypes are scored has been reasonably effective for complex traits as well. The problem is that pedigree methods suffer from the fact that the resolution of the mapping depends on both sample size and marker density, and even the largest studies typically have a rather poor resolution of 5–10 cM. More recently it was discovered that one can apply similar analysis to affected sib pairs, noting that sharing of marker alleles and of phenotypes is more likely when the marker is closely linked to segregating variation that causes trait variation. Although affected sib pair (ASP) methods have the big advantage that much larger samples can be obtained, they lack the advantage of acquiring information about linkage phase that a multigeneration pedigree provides, and so in the end the resolution of ASP methods also are less than optimal. The final assessment of the efficacy of using whole genome linkage disequilibrium (LD) scans to find genes associated with risk of complex disease will have to wait until the approach is actually tried. In the meantime, I here discuss recent work that has been done seeking to improve the chances for success of the method by characterizing and analyzing the haplotype structure of human genetic variation.

Section snippets

Directly testing disease association

Risch and Merikangas [1] made the observation that an outbreeding population has some properties like a large extended family — namely there are many meioses in which the association between a marker and a disease-associated allele can recombine. But if the marker and disease-associated alleles are found to be in tight statistical association, this may amount to evidence that they are closely linked. There is a large body of theory behind this notion, and the theory describes many factors that

Linkage disequilibrium across the genome

To assess the overall efficacy and cost of LD mapping, we first need to determine the distribution of spans of the human genome that exhibit LD. This had been done for several human genes by resequencing to obtain many single nucleotide polymorphisms (SNPs) in the same gene 4., 5., 6., 7.. Figure 2 shows the relationship between physical separation between SNPs and two common metrics for LD. Other metrics for LD are evaluated by Devlin and Risch [8]. One problem with this approach is that from

Population subdivision, demography and linkage disequilibrium

There is a long history of interest in inferring the degree of population subdivision from genetic data, and application of this analysis to human genetic markers reveals that ∼8–10% of the genetic variance is found within population groups 12., 13.. When making inferences of association between genes and complex diseases, the need to understand population subdivision is critically important. If one does a case-control study, and the samples under study are a mix of two somewhat isolated

Complex disorders are not simple

Even if LD mapping of single genes were simple, mapping complex traits is an enormous challenge for the same reasons that it is so difficult to draw firm conclusions from epidemiological data. Genetic variation is likely to contribute to overall risk of many complex diseases, but the genetic component may be small compared to some environmental insults, and the fact that genes and environment interact, and that health is something that is deeply context dependent (CF Sing, JH Stengård, SLR

Models for the genetics of complex disorders

There is a long history in genetic analysis that points to the power of a good model. If we formulate a scheme whereby genes affect a trait, we are much more able to test and either reject or accept the model, compared to a more open-ended situation. Key parameters in whole-genome association testing are the number of genes that are having a causal effect on risk, the frequency of the variant alleles, and the magnitudes of effect of those alleles on risk. Before we consider the complexities of

Selection in the human genome

A factor that inflates LD in the human genome more directly and strongly than any other is natural selection. This is especially evident in cases where a single gene has an influence on the risk from a disease, such as the improved resistance to Vivax malaria by people with the Duffy null allele [34] or increased resistance to Plasmodium malaria in individuals with the low-activity alleles of G6PD [35]. The recent generation of near genome-wide datasets on SNP genotypes has opened the

Why HapMap?

The NIH Haplotype Map (HapMap) project is the largest single project in human population genetics ever attempted, and as a result it has received some harsh criticism. As of writing, the exact scope of the project is unclear, but it will entail a large quantity of SNP genotyping in several human population groups. Given that the project will be completed and the genotype data will be collected, the constructive challenge we face is to formulate the best questions and the best use of the

Conclusions

The potential for a disease to be determined by a vast array of extremely rare alleles in many different genes embedded in a network of highly epistatic genes with strong context-dependent environmental effects makes it possible to imagine that some diseases may have a genetic component but be truly unyielding by the proposed methods. But even in this worst-case scenario, we already know that not all complex diseases are this ill behaved, so the problem can be restated as finding efficient

References and recommended reading

Papers of particular interest, published within the annual period of review, have been highlighted as:

  • of special interest

  • ••

    of outstanding interest

Acknowledgements

This work was supported by grant HG02352 from the United States National Institutes of Health.

References (57)

  • L Partridge et al.

    Optimality, mutation and the evolution of ageing

    Nature

    (1993)
  • M Hamblin et al.

    Complex signatures of natural selection at the Duffy blood group locus

    Am. J. Hum. Genet.

    (2002)
  • Y.X Fu et al.

    Statistical tests of neutrality of mutations

    Genetics

    (1993)
  • A.G Clark

    Inference of haplotypes from PCR-amplified samples of diploid populations

    Mol. Biol. Evol.

    (1990)
  • K.L Mohlke et al.

    High-throughput screening for evidence of association by using mass spectrometry genotyping on DNA pools

    Proc. Natl. Acad Sci. USA

    (2002)
  • N Risch et al.

    The future of genetic studies of complex human diseases

    Science

    (1996)
  • J Hastbäcka et al.

    Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland

    Nat. Genet.

    (1992)
  • W.G Hill et al.

    Maximum-likelihood estimation of gene location by linkage disequilibrium

    Am. J. Hum. Genet.

    (1994)
  • K.G Ardlie et al.

    Patterns of linkage disequilibrium in the human genome

    Nat. Rev. Genet.

    (2002)
  • D.A Nickerson et al.

    DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene

    Nat. Genet.

    (1998)
  • D.E Reich et al.

    Linkage disequilibrium in the human genome

    Nature

    (2001)
  • S.B Gabriel et al.

    The structure of haplotype blocks in the human genome

    Science

    (2002)
  • E Dawson et al.

    A first-generation linkage disequilibrium map of human chromosome 22

    Nature

    (2002)
  • G Barbujani et al.

    An apportionment of human DNA diversity

    Proc. Natl. Acad Sci. USA

    (1997)
  • C Romualdi et al.

    Patterns of human diversity, within and among continents, inferred from biallelic DNA polymorphisms

    Genome Res.

    (2002)
  • L.B Jorde

    Linkage disequilibrium and the search for complex disease genes

    Genome Res.

    (2000)
  • J.K Pritchard et al.

    Case-control studies of association in structured or admixed populations

    Theor. Popul. Biol.

    (2001)
  • J.K Pritchard et al.

    Linkage disequilibrium in humans: models and data

    Am. J. Hum. Genet.

    (2001)
  • View full text