Introduction

Recent interest in association mapping has put focus on factors generating linkage disequilibrium (LD) in human populations. The success of association mapping, or linkage disequilibrium mapping, in locating candidate genes for a disease depends on the extent of LD between typed markers and the disease gene in the population from where the affected individuals are sampled. Various factors can influence the extent of LD in human populations. The age of the disease or marker alleles, recombination and gene conversion, natural selection and population admixture will influence LD in a population.1 Other important factors are the size of the population and its demographic history. Established population genetic theory predicts LD to be low in large populations under mutation-drift equilibrium whereas it will increase in small populations due to genetic drift.2,3 At the extreme is the situation in which a population has been through a population reduction or is founded by few individuals. This situation decreases the allelic diversity and enhances the effect of genetic drift.4 In populations where bottlenecks have been followed by rapid expansion, genetic drift has been suggested to have little effect on the amount of linkage disequilibrium.5 Predictions about the role of genetic drift in growing human populations are still under debate, though, and may vary with the fecundity distribution of the population in question.6 If a rapidly expanding population has been little influenced by genetic drift, a disease gene that was present in one or a few founders may be surrounded by a single or limited number of marker haplotypes if the LD between these markers is maintained in the expanding population. 7 In small populations of constant size, though, it is not necessarily a single marker haplotype at a locus that has increased to a high frequency through genetic drift. In these populations several marker haplotypes may be found associated with a disease gene.7

Based on the predictions from population genetic theory we expect isolated populations that have gone through severe bottlenecks or which have remained at a constant small size for a long time to have extensive linkage disequilibrium compared to outbred populations. Such inbred and selected populations tend to be ideal for the linkage disequilibrium mapping of disease genes. Several monogenic diseases have been mapped successfully by this method in isolated populations.8,9 Expectations are that linkage disequilibrium mapping can be equally useful in genetic studies of complex diseases,10,11 and that the same advantages of using population isolates may apply to diseases with more than one gene involved, depending on the underlying genetic model.12,13 So far the method has been used to locate a number of candidate regions that may harbour risk genes for complex diseases.14,15

Empirical assessments of the levels of LD in population isolates do not always conform to expectations based on the historic record. Modest levels of LD in the Finnish and Sardinian isolates suggest introduction of common variants by multiple founders.16,17,18 Similar low levels of LD have been obtained in isolates such as the Afrikaners and Ashkenazi,19 whereas predicted high levels of LD have been confirmed in populations such as an expanding Costa Rican population20 and in the constantly small Saami population.21 The purpose of the present study was to evaluate the amount of linkage disequilibrium in the population of the Faroe Islands and to compare this with its demographic history as revealed from microsatellite data. The historic record suggests that the Faroe Islands is an ideal population isolate in which to search for candidate genes of various genetic diseases22 and the interpretation of the amount of LD found in this population is important for such future studies. Linkage disequilibrium between microsatellite markers on 12q was compared to that of two larger and less isolated populations, the British and the Danish. Our results confirm our expectations of high levels of LD in the Faroese compared to the two mixed outbred populations, but do not indicate the occurrence of a severe population bottleneck.

Materials and methods

History of the Faroese

Historic records suggest that the isolated Faroese archipelago in the North Atlantic Ocean (Figure 1) was settled around 825 AD by emigration from the western part of Norway.23 Paleobotanical findings of cultivated species and associated weeds does, however, indicate the presence of people on the islands two centuries earlier.24 These people had possibly left the archipelago before the Norse settlement.23,24 Population size in the archipelago as a whole may have been as small as 4000 in the late 1300s increasing to 9000 in the 1800s and further to around 48 000 inhabitants at present.25 A count of 4773 inhabitants was made in 1769.25 Migration to and from the Faroe Islands has been sparse as a consequence of their geographic position, the commercial monopoly prevalent from 1380 until 1856 and legislation aimed at avoiding depopulation. Epidemics, such as a smallpox epidemic in 1709, and periods of famine may have reduced the population severely.25 Genealogical investigations of Faroese families show that considerable movements occurred between villages despite the remote positions that many of them have (AG Wang, pers. comm.).

Figure 1
figure 1

The Faroe Islands in the North Atlantic Ocean.

Sampling of individuals

Forty-three unrelated Faroese individuals, 120 unrelated Danish individuals and 316 unrelated British individuals were sampled from various regions of the three countries. The Faroese are part of a control population in an ongoing study of the genetic aetiology of complex diseases. British individuals were mainly of English origin although a small number had Irish, Welsh or Scottish parents.

Genotyping

DNA was extracted from whole blood using a standard Triton lysis protocol with sodium chloride/isopropanol precipitation or a phenol/chloroform method.26 Single or multiplex PCR was performed with amplification of a maximum of three markers simultaneously and analysed on an ABI Prism 310 Genetic Analyser or a Li-Cor IR2 automated sequencer.

Fourteen markers on 12q24.3, spanning a region of 2.2 Mb (5 cM), were genotyped for the analysis of linkage disequilibrium (Table 1). Three of these were typed only for the British and Danish individuals. An additional four markers positioned distal to the 2.2 Mb region were typed in the Faroese so that a total of 4.3 Mb (16 cM) were investigated in this population.

Table 1 Gene diversity (D) and allelic diversity (n) with corrections according to the rarefaction method in brackets

Fifteen unlinked microsatellite markers were typed in 41 individuals from the same sample of Faroese for the demographic analyses (Table 2). All were tri- or tetra nucleotide markers assumed to evolve under the stepwise mutation model.27

Table 2 Test for the presence of a historic population bottleneck in the Faroese population

Allele and gene diversity

Allelic diversity is the number of alleles at a locus in a population. Differences in sample sizes were corrected for by the rarefaction method28,29 using the programme CONTRIB (www.pierroton.inra.fr/genetics/labo/Software/). This method adjusts for the fact that microsatellites will have rare alleles because of their relatively high mutation rate and that these rare alleles are less likely to be detected with smaller sample sizes. A minimum of 84 chromosomes was successfully typed at every locus in all three populations and allelic diversity in the three populations was therefore standardised to a sample size of 84 chromosomes using the rarefaction method.

Nei's average heterozygosity or gene diversity was calculated as a measure of variation at a locus l with u alleles: Dl=1 - Σplu and as a population mean over m loci: D=1 – 1/mΣΣplu..4,30 Therefore D is the probability that two alleles sampled at random are different. Only the loci shared between all three populations were used in the calculation of D.

Linkage disequilibrium

Tests for Hardy–Weinberg equilibrium were performed on genotypic data as a prerequisite for further analyses of linkage disequilibrium using the method of Guo and Thompson.31

The 43 Faroese consisted of 22 parental couples that had their haplotypes reconstructed from one offspring. To be able to reconstruct the phase of the two parents' genotypes from the genotype of the offspring we need to assume that no recombination in this one generation has occurred at the segment on 12q investigated. Exact tests for linkage disequilibrium between all pairwise markers were then performed using the haplotype data. The test is an extension of Fisher's exact probability test on contingency tables, which evaluates the probability of tables with the same marginal totals and a probability equal or less than the observed table.5 It uses a Markov Chain to explore the space of all possible tables.31

Gametic phase was unknown in the British and Danish populations and pairwise tests for linkage disequilibrium were therefore performed using a likelihood ratio test whose empirical distribution was obtained by a permutation procedure.32 In this test the haplotype frequencies were estimated using the EM algorithm to obtain the likelihood of the data not assuming linkage equilibrium. The EM algorithm has a very high ability of inferring correct haplotypes when sample sizes are large (>100 individuals) as is the case in this investigation.33

All three tests are implemented in Arlequin version 2.00 and performed with 1000 permutations.34

Population bottleneck and demography

The number of alleles at a locus is reduced faster than the gene diversity at that locus when a population is reduced in size since rare alleles are more readily affected by drift than more frequent ones.35 As population size is restored the average number of alleles increases faster than the gene diversity until reaching mutation-drift equilibrium.35 After going through a bottleneck, a population will have a transient excess in gene diversity compared to that expected for the allele number. A test for the presence of a recent genetic bottleneck can be based on the differences in the observed and expected gene diversities in a population across a large set of unlinked loci.36 For such a test, the program Bottleneck37 computes the gene diversity Heq expected at each locus from the observed number of alleles given the sample size N under the assumption of mutation-drift equilibrium. This is done through simulation of the coalescent process from N genes under the stepwise mutation model. Power analyses and theoretical models suggest that a 100-fold reduction in population size can be detected with a 60–80% probability within a range of 0.25×2Ne to 2Ne generations after the reduction occurred. 36 Ne is the post-change effective population size. The power for the detection of gene diversity excess increases as the relative reduction in population size becomes larger. Based on these power analyses we predict that a founder event on the Faroe Islands approximately 1200 years ago with a post-bottleneck size (Ne) of up to a few hundred will be detected with a probability of at least 60% in our analysis.36 This is assuming that the founder population is at least 100 times smaller than the source population. The method has proven successful in detecting past changes in population size in a series of populations including humans.38

Results

Allelic and gene diversity

All markers for the linkage disequilibrium analyses were in Hardy–Weinberg equilibrium in all the three populations (results not shown). Eight of the 11 loci typed in all three populations had lower allele counts in the Faroese population and in six of these this persisted after correcting for differences in sample sizes (Table 1). Marker D12S1614 had relatively low gene diversity in the Faroese in relation to its high allele number. All loci contained rare alleles that were present in just one population. Twenty unique alleles were found in the British, whereas the Danish had five and the Faroese population only one. All unique alleles had low frequencies not exceeding 1% in the British and Danish populations. The unique allele in the Faroese was present in only two copies. Very few of the remaining alleles present in all three populations had frequencies below 1%. These frequency patterns may reflect a sampling problem and not differences in the demographic history of the populations. The numbers obtained using the rarefaction method may be a better measure of allelic diversity. Also, it should be kept in mind that this sampling bias may lead to higher estimates of LD in the British and Danish samples compared to the Faroese due to the sensitivity of this estimate to the number and frequencies of alleles. Overall, the mean gene diversity in the Faroese population is slightly smaller than in the other two populations. The results suggest different demographic histories of the three populations and may reflect a founder effect in the Faroese population. No formal test of differences in allelic and gene diversities are performed due to the lack of independence of the loci.

Linkage disequilibrium

There was extensive LD between markers on 12q in the Faroese compared to the other two populations (Figure 2). Linkage disequilibrium extended as far as 3.8 Mb in the Faroese population (marker D12S1639 and D12S97) while marker pairs in LD were separated by a maximum of 1.4 Mb in the British (D12S866 – D12S2075) and 1.2 Mb in the Danish populations (D12S342 – AFMB337ZD5). No corrections for multiple tests were performed due to the lack of independence of marker pairs. A summary of the number of different haplotypes at each segment in a population reveal no decreased variability in the haplotypes found in the Faroese when taking the lower sample size of this population into account (Figure 3). LD correlated with physical distance in the Danish populations (Mantel test with 10 000 permutations, r=0.377, P=0.002), had a non-significant correlation coefficient in the British (r=0.180, P=0.099) and a low and non-significant correlation coefficient in the Faroese population (r=0.020, P=0.420).34,39

Figure 2
figure 2

(A) Fisher's exact test of pairwise linkage disequilibrium between microsatellites. No correction for multiple tests were performed. Distances between neighbouring markers are indicated in Kb. (B) Summary of the frequency of marker pairs in linkage disequilibrium. N is the total number of marker pairs tested within the distance interval. Notice the non-linear scale above 1500 Kb.

Figure 3
figure 3

The number of different haplotypes found in a population. A total number of 86,632 and 240 chromosomes were typed in the Faroese, British and Danish populations respectively. A few missing data lowered the total number of haplotypes that could be determined in each population from the maximum possible. Maximum values were 84 in the Faroese, 628 in the British and 238 in the Danish population. Minimum values were 70,574 and 216. The stippled lines indicate the average number of haplotypes determined in each population. Each point in the graph refers to a segment with the marker on the x-axis at the right end of that segment. Haplotypes in the British and Danish populations were estimated from phase-unknown genotypes using the EM algorithm.

Population bottleneck and demography

All 15 markers for the demographic analysis were polymorphic (Table 2) and all loci conformed to Hardy-Weinberg expectations (results not shown). Nine loci had gene diversity excess compared to the 8.85 loci expected. The probability of obtaining this result is 0.58. There is therefore no significant gene diversity excess and no indication of a recent genetic bottleneck in the Faroese population. There is also no significant gene diversity deficiency across loci (Wilcoxon one-tailed test, P=0.68). A consistent gene diversity deficiency across loci would be indicative of a recent population expansion.36

Discussion

Increased levels of linkage disequilibrium on 12q in the Faroese relative to the British and Danish populations reveals prominent differences in the demographic history of the three populations. Levels of allelic diversity at microsatellite loci on chromosome 12q are somewhat lower than found in outbred British and Danish populations. However, there is no sign of a general gene diversity excess at other unlinked loci neither are the haplotypes longer or less variable in the Faroese population compared to the other two populations. There is, therefore, no evidence of a strong population bottleneck in the Faroese population. It is possible that the high levels of LD are instead due to the general effect of genetic drift in a small population. The total population on the Faroe Islands remained at a small size through centuries until the rapid expansion in the 1800s. Simulation studies on human populations supplement predictions from population genetic theory that such slow early growth can be one of the key factors responsible for increasing the extent of LD in an isolated population as compared to an early rapid expansion.13 Similar simulation studies can be made on the Faroese population provided that relevant demographic information can be obtained.

Our results confirm that some population isolates have elevated levels of LD and may therefore be valuable in linkage disequilibrium mapping of genetic diseases even without the use of dense marker maps. The estimate by Kruglyak13 of an average (biallelic) marker distance of 3 Kb needed to detect useful levels of LD in an outbred population is much exceeded by this, and some of the other rare empirical studies of LD in isolated populations.20,21,40 Estimating marker densities to be used in linkage disequilibrium mapping in the Faroese population may not be appropriate at the present state due to the non-uniform distribution of LD known to occur on the human genome.41

The demographic history revealed by the genetic data may have some important implications for gene mapping studies. If no strong bottleneck has been present in the Faroese population we should not expect a single haplotype to be shared by the affected disease gene. This is particularly true if the disease has a complex background and is attributable to a number of common alleles that could have entered the population through the multiple founders.7 The test for gene diversity excess used here only detects bottlenecks of relatively small size. It does, however, include the range of 10–100 Ne suggested to be required for the population reduction to result in increased levels of LD in post-bottlenecked populations.13

It is well known that LD may not necessarily be associated with the physical distance between markers.42 Although an overall correlation between LD and physical distance is present in the Danish populations and a positive but non-significant correlation is present in the British population, all three populations have several examples of close neighbouring markers that are in linkage equilibrium. It has been suggested that such a patterns can be due to gene conversions that break up LD between closely linked markers but have negligible effect on more distant sites.42,43,44 LD extends over long regions in the Faroese population and after identification of a candidate region by association analyses additional typing and haplotype analyses will be needed to narrow down the gene of interest. That LD is not completely predictable from small physical distances has obvious consequences for such fine-mapping. The test for linkage equilibium at a marker closely linked to the disease locus may be significant and haplotypes around a disease locus may be broken up, which will hamper the localisation of the disease locus. Analyses of multiple close markers in a candidate region will therefore be necessary for gene localisation.