Table 1

Challenges in the analysis of rare and low-frequency variants in human genetics

ChallengeDescription
TechnologyChoice between next-generation DNA sequencing and genotyping arrays recently developed to capture rare/low-frequency coding variation. Arrays are less expensive and easier to analyse, but are limited to known genetic variants—this might be more of a concern for experiments in non-European populations. Sequencing is becoming more affordable, but still expensive and computationally intense. Sequencing candidate genes, the whole exome or the whole genome will impact the class of genetic variation discovered and the multiple hypothesis burdens.
Study designMost published rare-variant association analyses have used unrelated individuals given the relative ease to assemble such experimental design. For the same number of participants, a cohort of related individuals has less power to discover new genetic variants (given that fewer independent chromosomes are tested) than a cohort of unrelated individuals. However, the allele frequency might be higher and the phenotypical effect stronger, thus increasing power. Additional methodological work is needed to compare statistical power to find genetic associations with rare/low-frequency variants in pedigrees vs unrelated individuals, in particular, in the context of gene-based tests.
Statistical analysisMinor allele frequency (MAF) impacts statistical power. For instance, under some assumptions (OR=1.5, α=5×10−8, population prevalence=5%), we would need >400 000 individuals to have 80% power to find an association with a rare variant (MAF=0.1%). For a common variant (MAF=10%), ∼4600 individuals would be sufficient. Furthermore, because the number of rare variants is higher than the number of common variants in the human genome, the multiple hypothesis burdens for rare-variant association studies is higher, again decreasing statistical power.Statistical tests that combine variants, for instance by gene, have been developed (recently reviewed in ref. 7), although the optimal tests will likely depend on the specific genetic architecture of each phenotype.
Variant annotationCoding variants are more likely to have phenotypical effects, although a large fraction will be neutral. Bioinformatic tools have been developed to prioritise functional variants, and thus decrease the signal-to-noise ratio, but they are imperfect.77 78 These tools often also ignore non-coding variants. Private rare non-coding variants can cause Mendelian diseases.79 Although there are only few (if any) examples of rare non-coding variants associated with complex human traits, they probably exist but we have not carefully looked for them yet. Ideally, experimental validation should guide the selection of likely functional variants before association testing, although this is difficult to implement using high-throughput methods. 
Population stratificationFollowing the original observation that current statistical methods (eg, principal component adjustment) cannot properly account for population stratification of rare variants,65 a large number of reports have been published, although the optimal method is unclear. Inflation due to population stratification of rare variants might also depend on the type of gene-based tests used.80 Ideally, having a large number of genotyped or sequenced controls would allow ancestry-based matching with cases.81
Phenotypical variance explainedThe phenotypical variance explained by a variant depends on the effect size and the allele frequency. For rare variants to explain a large fraction of the missing heritability, phenotypical effects would need to be high. Although this is the case for PCSK9 and a handful of other genes that harbour penetrant rare alleles, most rare variants will likely have weak-to-modest effects. Using calculations based on empirical data, a recent report suggests that the heritability explained by rare variants could be substantial (18%–84%) but that we would need a very large sample size (>1 000 000 individuals) to find all the associated variants.