Article Text

Large scale association analysis for identification of genes underlying premature coronary heart disease: cumulative perspective from analysis of 111 candidate genes
  1. J J McCarthy1,
  2. A Parker2,
  3. R Salem1,
  4. D J Moliterno6,
  5. Q Wang3,
  6. E F Plow3,
  7. S Rao3,
  8. G Shen3,
  9. W J Rogers4,
  10. L K Newby5,
  11. R Cannata3,
  12. K Glatt2,
  13. E J Topol3,
  14. for the GeneQuest Investigators*
  1. 1San Diego State University, San Diego, CA 92182, USA
  2. 2Millennium Pharmaceuticals, Inc., Cambridge, MA 02139, USA
  3. 3Cleveland Clinic Foundation, Cleveland, OH 44195, USA
  4. 4University of Alabama, Birmingham, AL 35203, USA
  5. 5Duke University, Durham, NC 27708, USA
  6. 6University of Kentucky, Lexington, KY 40536, USA
  1. Correspondence to:
 E J Topol MD
 The Cleveland Clinic Foundation, Department of Cardiovascular Medicine, 9500 Euclid Ave., Desk F-25, Cleveland, Ohio 44195;


Background: to date, only three groups have reported data from large scale genetic association studies of coronary heart disease using a case control design.

Methods and results: to extend our initial report of 62 genes, we present data for 210 polymorphisms in 111 candidate genes genotyped in 352 white subjects with familial, premature coronary heart disease (onset age for men, 45; for women, 50) and a random sample of 418 population based whites. Multivariate logistic regression analysis was used to compare the distributions of genotypes between cases and the comparison group while controlling for age, sex, body mass, diabetes, and hypertension. Significant associations were found with polymorphisms in thrombospondin-4 (THBS4), thrombospondin-2 (THBS2) and plasminogen activator inhibitor-2 (PAI2), the strongest being with the A387P variant in THBS4 (p = 0.002). The THBS2 and THBS4 associations have since been replicated. We evaluated polymorphisms in 40 genes previously associated with coronary heart disease and found significant (p<0.05) associations with 10: ACE, APOE, F7, FGB, GP1BA, IL1RN, LRP1, MTHFR, SELP, and THPO. For five of these genes, the polymorphism associated in our study was different from that previously reported, suggesting linkage disequilibrium as an explanation for failure to replicate associations consistently across studies. We found strong linkage disequilibrium between polymorphisms within and between genes, especially on chromosome 1q22-q25, a region containing several candidate genes.

Conclusions: despite known caveats of genetic association studies, they can be an effective means of hypothesis generation and complement classic linkage studies for understanding the genetic basis of coronary heart disease.

  • PAI2, plasminogen activator inhibitor-2
  • THBS2, thrombospondin-2
  • THBS4, thrombospondin-4

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Coronary heart disease is the leading cause of death and disability in the Western world. Coronary heart disease is a complex disease with both environmental and genetic determinants. Risk factors known to increase the likelihood of developing coronary heart disease include hypertension, diabetes, obesity, hypercholesterolaemia, and diet. In addition, a positive family history of coronary heart disease in first degree relatives is a strong independent risk factor for coronary heart disease.1 The risk of developing coronary heart disease is 2–12 times higher for individuals with a first degree family history compared with those without a family history of coronary heart disease.1 The risk is highest for early age of onset in the affected family members and the greater the number of affected first degree relatives.2,3

Despite years of research, the genetic basis of coronary heart disease remains to be fully elucidated. Many of the classic risk factors are themselves under genetic control (blood pressure, lipids, obesity), but they account for only a portion of familial aggregation of coronary heart disease.1 To date, four published studies have been carried out in high risk families employing genome wide scans to identify regions of the genome linked to coronary heart disease. Two significant linkages were identified for coronary heart disease on chromosomes 2q21.1-22 and Xq23-26 using a Finnish study population.4 A study among Indo-Mauritians5 identified a region on chromosome 16p13 linked to coronary heart disease. Another study6 found evidence for linkage to myocardial infarction in a region on chromosome 14q11.2-12. A study of acute coronary syndrome7 found a suggestive linkage to 2q36-37.3. The specific genes underlying these linkages have yet to be found.

Most studies of coronary heart disease genetics to date have focused on a candidate gene approach in unrelated cases and controls, a design that is more powerful than linkage analysis in families when examining a specific gene of interest. Three large scale studies have been published to date, two in Japanese populations8,9 and an initial report of the current study by our group.10 Despite numerous positive findings from these and other studies, consistent evidence for any single gene associated with coronary heart disease is lacking. Several recent meta-analyses suggest that the effect of individual genes on the risk of complex traits such as coronary heart disease may be weak.11,12 Therefore, attempts should be made to improve the power to detect genetic associations. Besides using a large sample size, selecting cases that are genetically loaded may also improve power.

This study was undertaken to extend substantively our first high throughput candidate gene association study of coronary heart disease10 by comparing genetically enriched cases to ethnically matched adult general population controls. The goal of our study was twofold: first, to attempt to replicate associations previously reported in the literature and second, to survey genetic variants in a large number of candidate genes to generate novel hypotheses regarding their association with coronary heart disease.


Study population

Cases were drawn from the GeneQuest population, a study of affected sibling pair families ascertained for early onset familial coronary heart disease from 15 sites in the United States,10 the majority coming from Ohio, Alabama, and North Carolina. Families were enrolled if the probands had premature coronary heart disease (age of onset in men, 45; in women, 50) and a sibling of theirs met the same criteria. Coronary heart disease was defined as having either myocardial infarction, angiography with >70% stenosis, or surgical or percutaneous coronary revascularisation and was confirmed by review of medical records. Of the 762 cases from the GeneQuest family study, 365 unrelated white individuals were chosen (the sibling with the earlier age of onset was selected). Thirteen cases had insufficient DNA and were removed, leaving a final sample size of 352 cases.

Comparisons were made with a random population sampling of 447 American whites, ascertained through random digit dialling in the greater Atlanta area, and ranging in age from 20–70 years. The comparison group was not selected based on disease status or family history and thus may include individuals with either overt coronary heart disease or at high risk of coronary heart disease. Twenty nine individuals were removed from further analysis after finding that they were missing >50% of their genotypes due to insufficient quantity or quality of DNA. The final sample included 418 individuals.

At enrolment, anthropometric measures, health and lifestyle questionnaires, medication usage, and family history data were collected by a trained interviewer for both cases and controls. Non-fasting blood samples were drawn for measurement of serum markers and extraction of DNA. Genomic DNA was isolated from peripheral blood lymphocytes of the controls using the Puregene kit (Gentra Systems) according to the manufacturer’s suggested protocol at a commercial laboratory. Diabetes status (type I and type II) was based on self report. Hypertension was defined based on a single measure of blood pressure as systolic blood pressure >140 or diastolic blood pressure >90 or on the use of antihypertensive medication. Body mass index was calculated in kg/m2.

Candidate gene choice, polymorphism selection, and genotyping

A total of 243 candidate genes were initially chosen for analysis based on previously reported genetic associations or knowledge of their involvement in coronary heart disease pathways of endothelial cell biology, vascular biology (thrombosis), lipid metabolism, the coagulation cascade, and other risk factors (diabetes, obesity). To cast a wide net to find suggestive associations with coronary heart disease or myocardial infarction in as many genes as possible, we focused on 1–3 common polymorphisms, the majority single nucleotide polymorphisms in the coding region per gene. Single nucleotide polymorphisms were identified from two sources: a proprietary database, the result of screening the coding region of several thousand genes by three methods in ethnically diverse cell lines,13 and the public database, dbSNP. For genes having possible associations with coronary heart disease, additional common single nucleotide polymorphisms in and around the gene were typed. This report is based on an analysis of 111 of these genes where validated coding region single nucleotide polymorphisms were readily available.

High throughput genotyping was carried out by one of two methods: single base extension with detection by fluorescence energy transfer or fluorescence polarisation or the 5′ nuclease assay with allele specific TaqMan probes.14 PCR conditions, oligonucleotide primers, and probes are available from the authors upon request.

Association analysis

Analyses were performed using the SAS statistical package version 6.12 (SAS Institute, Inc.). Differences in the distribution of genotypes and other covariates between cases and the random population sample were assessed with a Wald χ2 statistic (2 degrees of freedom test) implemented in the PROC LOGISTIC procedure in SAS. Two outcomes were examined: coronary heart disease, where the entire set of cases was compared with the general population, and myocardial infarction, where a subset of coronary heart disease cases whose qualifying event for enrolment was myocardial infarction, were compared with the random population. For each single nucleotide polymorphism, odds ratios and Wald 95% confidence intervals were calculated for homozygous variant (minor allele) and heterozygous genotypes with the homozygous wild-type genotype as the reference. Men and women were analysed separately for single nucleotide polymorphisms in two X-linked genes. Multivariate analysis was performed for individual single nucleotide polymorphisms that showed statistically significant associations (p<0.05) to control for potential confounding effects of age, sex, body mass index, hypertension, and type 2 diabetes. Hardy-Weinberg equilibrium was assessed with a χ2 test of goodness of fit among the random population sample.

The potential confounding effect of population stratification was evaluated with the genomic control method.15 Briefly, 96 unlinked single nucleotide polymorphisms spread throughout the genome were selected and 72 successfully typed in 100 randomly selected cases and 100 from the general population sample. Differences in allele frequencies of the 72 single nucleotide polymorphisms between the two groups were tested with a χ2 statistic. In the absence of stratification, the expected mean χ2 value for all single nucleotide polymorphisms typed is 1.

Linkage disequilibrium between single nucleotide polymorphisms within the same gene or for genes that cluster together on a chromosome was assessed with the normalised disequilibrium parameter, D′,16 using the EM algorithm17 among the combined cases and controls.


Descriptive statistics

The characteristics of the study population, which included 352 white patients with coronary heart disease and a random population based sample of 418 whites, are shown in table 1. At enrolment, cases ranged in age from 29 to 72 years (mean 48.1 years) and the random population sample from 20–70 (mean 43.2 years). Cases were retrospectively ascertained with an average time from their qualifying event to enrolment of 6.8 years (range 0–30 years) for women and 9.3 years (range 0–42 years) for men. Thus, the cases were probably overrepresented by coronary heart disease survivors. Because all cases were chosen from families originally ascertained for coronary heart disease, they are enriched for coronary heart disease risk factors including hypertension, high body mass index, and diabetes (types 1 and 2 combined). No differences were found in current smoking status. Due to the retrospective ascertainment of cases, history of smoking prior to coronary heart disease onset could not reliably be obtained. No difference was found in the proportion of cases with diabetes, high body mass index, hypertension, current smoking, or of the male sex when comparing myocardial infarction to other qualifying events, nor between subjects enrolled at the Cleveland Clinic versus other sites.

Table 1

Characteristics of the study population


Ninety percent of polymorphisms typed came from a proprietary database of coding region variants discovered through direct sequencing efforts. A total of 210 polymorphisms, including 207 single nucleotide polymorphisms and three insertion/deletions were successfully evaluated in 111 candidate cardiovascular disease genes in our case control study. A complete list of the polymorphisms and their flanking sequences can be found in the supplemental table (online at An additional 76 single nucleotide polymorphisms were evaluated but not analysed. These included 13 single nucleotide polymorphisms for which working assays could not be developed; one single nucleotide polymorphism that was not in Hardy-Weinberg equilibrium (p<0.0001), 51 that typed as monomorphic, suggesting they were either not real or too infrequent in our population to be detected; and two single nucleotide polymorphisms for which the location in the gene could not be confirmed.

Of those 210 polymorphisms successfully typed (supplemental table, online at, most occurred in the coding region of the gene (44% missense and 36% silent). The remaining polymorphisms were in the untranslated regions or intronic regions immediately flanking the exons. For 62 candidate genes, only one polymorphism was typed. There were two polymorphisms typed in 22 genes, three in 12 genes, and 4–8 in 15 genes. Only 25 polymorphisms had minor allele frequencies ⩽5%. There were 27 polymorphisms with minor allele frequencies between 6–10%, 77 between 11–25%, and 84 between 26–50%. All of the single nucleotide polymorphisms were within the limits of Hardy-Weinberg equilibrium, taking into account the multiple testing that was done (all p values >0.008).

Missing genotype data can be an issue in studies employing high throughput genotyping. For seven polymorphisms, ⩾20% of subjects were missing genotypes; for 40 polymorphisms, 11–19% were missing; for 69 polymorphisms, 6–10% were missing; for 97 polymorphisms, ⩽5% of subjects were missing genotypes. Genotypes for all but five single nucleotide polymorphisms were in Hardy-Weinberg equilibrium in the controls. This number was within the range of expected deviations, given multiple testing. Uncorrected p values for the five single nucleotide polymorphisms ranged from 0.05 to 0.009. The test for population stratification resulted in a mean χ2 of 1.2 for the 72 single nucleotide polymorphisms typed, suggesting no significant stratification exists in our population (p>0.05).

Replication of previous genetic associations

In our study, we evaluated associations between coronary heart disease or myocardial infarction and polymorphisms in 40 genes for which prior associations with coronary heart disease have been described (“replication genes”). For 30 of these genes, we examined the exact variants previously associated (table 2). For the remaining 10 genes (ACE, CD14, IL1A, IL1RN, F13A1, LIPC, PON2, TGFBI, THBD, THPO, VWF), the polymorphisms examined were not the same as those associated previously. Polymorphisms in a total of 10 genes were significantly associated with coronary heart disease or myocardial infarction after controlling for covariates (table 3). In five of these genes, it was the exact same variant as previously reported: APOE, F7, FGB, GP1BA, and MTHFR. For the remaining five genes, associations were found between coronary heart disease or myocardial infarction and a polymorphism different from that previously reported: ACE, IL1RN, THPO, LRP1, and SELP. The SELP_3 single nucleotide polymorphism associated with coronary heart disease and myocardial infarction in our study was only in moderate linkage disequilibrium (D’ = −0.43) with the SELP_1 (T715P) single nucleotide polymorphism previously associated. The LRP1_3 single nucleotide polymorphism associated in our study was in strong linkage disequilibrium with the LRP1_5 single nucleotide polymorphism previously associated. For the remaining three single nucleotide polymorphisms, linkage disequilibrium with the previously associated variant was unknown.

Table 2

Results of the analysis of polymorphisms associated with coronary heart disease or myocardial infarction in prior studies

Table 3

Significant associations between coronary heart disease or myocardial infarction and single nucleotide polymorphisms in replication genes

Restricting the cases to those with myocardial infarction resulted in enhanced associations for single nucleotide polymorphisms in four genes—APOE, F7, GP1BA, and MTHFR—all consistent with a recessive mode of inheritance. The single nucleotide polymorphism in FGB, on the other hand, was associated only with the full set of coronary heart disease cases and was consistent with a dominant or codominant mode of inheritance.

Genetic associations described in the GeneQuest study for the first time

We have found significant (p<0.05) associations between coronary heart disease or myocardial infarction and single nucleotide polymorphisms in eight genes, which have not been previously described by others: ECE1, HRG, PAI2, PLCG1, SDC4, THBS1, THBS2, and THBS4. Only the THBS genes were published in our initial report.10 For an additional three genes—ANXA4, PLOD2, and PROC—the 95% confidence interval for one of the genotype groups excluded 1.0. Among these 11 top associations, only three were significantly associated with coronary heart disease or myocardial infarction in adjusted analyses: THBS4, THBS2, and PAI2 (table 4). Restricting the cases to those with myocardial infarction resulted in enhanced associations for all single nucleotide polymorphisms. The THBS4 variant conferred a greater than twofold increased odds of myocardial infarction in both heterozygotes and homozygotes and, of all the 210 polymorphisms examined in this study, was the strongest association remaining after adjustment for covariates (p = 0.002).

Table 4

Significant associations between single nucleotide polymorphisms and coronary heart disease or myocardial infarction uncovered in this population for the first time

Linkage disequilibrium

We also undertook an analysis of the extent of pairwise linkage disequilibrium between single nucleotide polymorphisms within a gene. To do this, we calculated the normalised disequilibrium parameter, D’, whose values range from −1 to +1. For the 185 pairs of single nucleotide polymorphisms examined, approximately 80% of the single nucleotide polymorphism pairs had “useful” linkage disequilibrium (D’>0.30) and 50% of the single nucleotide polymorphism pairs gave values of D’>0.90. As in other reports, disequilibrium was highly variable but in general it was strongest for single nucleotide polymorphisms in close proximity. The median linkage disequilibrium dropped off substantially for single nucleotide polymorphisms separated by >20 kb.

We examined disequilibrium between genes that cluster together on a chromosome. For the IL1 gene cluster including IL1RN, IL1B, and IL1A on 2q12-q22, linkage disequilibrium is strong between the two polymorphisms within the IL1RN gene (D’ = 0.76) and between IL1A and IL1B, separated by less than 5 kb (D’ = 0.81), but weak between IL1RN and IL1B (D’ = −0.22,) and between IL1RN and IL1A (D’ = −0.20). Due to the high linkage disequilibrium in the region, the association we found between IL1RN_3 and coronary heart disease may reflect haplotypes previously associated with coronary heart disease.47

Three fibrinogen genes FGA, FGB and FGG are clustered in a region of ≈50 kb on chromosome 4q31 (fig 1). Within the FGB gene, strong disequilibrium exists between all four polymorphisms typed (all pairwise D′>0.99), but not between single nucleotide polymorphisms in FGA, FGB, and FGG. Because of its close proximity (≈12 kb) to FGB, Pleiotropic Regulator 1 (PLRG1) might be considered a positional candidate gene for coronary heart disease.

Figure 1

Relative position of fibrinogen genes clustered on chromosome 4q28 ( FISH, fluorescence in situ hybridisation; PLRG1, pleiotropic regulator 1.

The selectin genes, SELP and SELL, and the factor V gene (F5) are clustered in an ≈220 kb region on 1q22-q25 (fig 2). Significant disequilibrium exists between single nucleotide polymorphisms within each gene, as well as between the SELP and SELL genes and the SELP and F5 genes, the latter pair being separated by <2000 bp. Significant associations found with single nucleotide polymorphisms in SELP and F5 single nucleotide polymorphisms in our and other studies may reflect common haplotypes spanning these genes.

Figure 2

Relative position of selectin and factor V genes clustered on chromosome 1q24 ( FISH, fluorescence in situ hybridisation.

Results of the analysis of haplotypes within 9 of the 13 genes where novel associations were uncovered were inconclusive because of the cumulative effects of missing data on sample size and power.


The current report is one of only three published large scale genetic association studies of coronary heart disease, and the only one among white Americans. Subsequent to our interim report on 62 candidate genes among genetically enriched coronary heart disease cases and population controls,10 Yamada and colleagues assessed 112 candidate gene polymorphisms in Japanese individuals with myocardial infarction9 and Ozaki et al examined over 90 000 gene based single nucleotide polymorphisms in Japanese patients who had had myocardial infarction.8 In our initial report, we described association between variants in three thrombospondin genes and myocardial infarction. Since then, various other groups have replicated the association with two of these genes. The thrombospondin 4 (THBS4) A387P single nucleotide polymorphism was confirmed to be significantly associated with myocardial infarction in men in the study from Yamada and colleagues, a European study of premature coronary heart disease,48 a population of myocardial infarction cases and controls from the Cleveland Clinic49 and in the Atherosclerosis Risk in Communities study.50 The THBS2 association has also been replicated in the European study of premature coronary heart disease48 and the Atherosclerosis Risk in Communities study.50 Furthermore, we have undertaken functional genomic studies and demonstrated that the A387P single nucleotide polymorphism is a gain of function mutation that interferes with endothelial cell adhesion and proliferation,51 which may account for predisposition to myocardial infarction. Now that we have expanded our assessment of candidate vascular biology genes from 62 to 111 in the current study, the persistent finding of THBS4 as the most significant further anchors its potential of being clinically meaningful. The ability of various diverse groups to replicate these findings in population studies adds validity to our approach as an effective means of hypothesis generation.

The current report represents a significant extension of our interim report, providing distinctive insight into the potential significance of additional vascular biology genes and particular single nucleotide polymorphisms. Using a database of single nucleotide polymorphisms identified through systematic screening of the coding region of candidate genes, we have identified novel genetic associations between single nucleotide polymorphisms in the endothelial converting enzyme (ECE1), histidine rich glycoprotein (HRG), phospholipase C, gamma 1 (PLCG1), syndecan (SDC4) and plasminogen activator inhibitor-2 (PAI2) genes and coronary heart disease or myocardial infarction. The validity and relevance of these associations to coronary heart disease or myocardial infarction requires further validation in other study populations.

Furthermore, our study provides strong support for the contribution of linkage disequilibrium in the failure to replicate genetic associations. In our study, we were able to replicate some, but not all, previously reported associations. Replication failure may be the result of testing not the underlying causal variant, but rather a variant in linkage disequilibrium with the causal variant. Our data support strong linkage disequilibrium between single nucleotide polymorphisms within a gene, and the presence of significant disequilibrium between genes in close proximity. Furthermore, while we were not able to replicate some associations directly, we did find evidence for association with other single nucleotide polymorphisms in the same gene. While examining a single polymorphism in a gene may be an efficient strategy for hypothesis generation, especially in large scale or genome wide studies, follow up should include a more comprehensive analysis of individual polymorphisms and haplotypes in the region to identify the “causative” variant prior to attempting to replicate the association in independent populations.

There are key limitations to acknowledge with our report which could have lead to either type I (false positive) or type II (false negative) errors affecting our results. Most associations found in our study were only nominally significant, a function of both the complex aetiology of coronary heart disease and a relatively small sample size. The p values presented here were not adjusted for multiple testing, which increases the likelihood of false positive associations. In addition, uncontrolled confounding is another possible source of spurious associations. While many important confounders were controlled for in our analysis, some potential confounders, such as lipid levels, were not. The retrospective nature of our study prohibited accurate temporal assessment of other possible confounding factors such as smoking, where only current smoking status was reliably obtained.

In addition, a number of factors could result in type II error, leading to the inability to detect a true underlying association. Among these are limited polymorphism and haplotype analysis within a gene, low allele frequencies, small effect sizes, and the relatively small sample size of our study population. In addition, since our control group was not selected to be free of coronary heart disease, the resulting misclassification of controls could bias the results toward the null. Therefore, we cannot rule out that an association does exist between any of the genes examined in this study and susceptibility to coronary heart disease or myocardial infarction.

Despite a relatively small sample size, we were able to replicate some prior associations reported in the literature and substantiate our previously reported association with THBS4 that withstood replication and has proved to be biologically relevant. Thus, the feasibility of our study design, which employed an enriched source of cases for detecting genetic associations, was demonstrated. By selecting cases with very early onset disease and a strong family history, our cases were weighted toward those individuals whose disease has a strong genetic aetiology. While our study design generated some interesting hypotheses related to genetic variation associated with coronary heart disease, further studies are required to demonstrate both the reproducibility and generalisability of these findings to non-familial, late onset cases.

The optimal approach to understanding the genetic basis of a complex disease such as myocardial infarction or coronary heart disease has been debated. Despite the vast collective efforts of many investigators to demonstrate reproducible associations between specific gene polymorphisms and coronary heart disease or myocardial infarction, no clear cut, reliable associations have been found. In a review of genetic association studies, Hirschhorn and colleagues52 have pointed out that only 6 of 166 putative associations between genetic variants and complex diseases were consistently replicated. Subsequent work by this group also indicated the problems of false negative, underpowered studies and highlighted the need for very large sample sizes to assess the modest but real risk of a polymorphism for a common disease.11 In addition, disease heterogeneity and linkage disequilibrium with nearby loci, as illustrated in our work must be considered. Recently, Colhoun et al53 have expressed their pessimistic concerns “that association approaches will always be hopelessly simplistic and reductionist”. On the other hand, with the exception of the recent identification of the myocardial infarction gene, MEF2A,54 the genome wide linkage analysis approach to coronary heart disease has thus far only identified putative loci but has not homed in on causative genes.

While some may argue that individual reports from small studies such as this only add to the confusion in the literature, the accumulation of both positive and negative findings will stimulate initiatives by independent groups to replicate novel hypotheses, minimise publication bias, and facilitate meta-analysis. Ultimately, either traditional meta-analysis or pooling of raw data across a number of similar observational studies with a thorough analysis of the effects of ascertainment criteria, confounding and interaction may identify subsets of individuals for whom a particular genetic marker may have the greatest impact on risk of coronary heart disease.


Supplementary materials

  • Web-only Table

    Files in this Data Supplement:

    • [View PDF] - Table W1A complete list of the polymorphisms and their flanking sequences


  • * The names of investigators and their institutions are presented in reference 10.

  • Conflicts of interest: none declared.