Article Text

Download PDFPDF

Non-heritable genetics of human disease: spotlight on post-zygotic genetic variation acquired during lifetime
  1. Lars Anders Forsberg1,
  2. Devin Absher2,
  3. Jan Piotr Dumanski1
  1. 1Department of Immunology, Genetics and Pathology, Rudbeck Laboratory, Uppsala University, Uppsala, Sweden
  2. 2HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, USA
  1. Correspondence to Dr Jan P Dumanski, Department of Immunology, Genetics and Pathology, Rudbeck Laboratory, Uppsala University, C11 building, 3rd floor, Dag Hammarskjölds väg 20, 75185 Uppsala, Sweden; jan.dumanski{at}igp.uu.se

Abstract

The heritability of most common, multifactorial diseases is rather modest and known genetic effects account for a small part of it. The remaining portion of disease aetiology has been conventionally ascribed to environmental effects, with an unknown part being stochastic. This review focuses on recent studies highlighting stochastic events of potentially great importance in human disease—the accumulation of post-zygotic structural aberrations with age in phenotypically normal humans. These findings are in agreement with a substantial mutational load predicted to occur during lifetime within the human soma. A major consequence of these results is that the genetic profile of a single tissue collected at one time point should be used with caution as a faithful portrait of other tissues from the same subject or the same tissue throughout life. Thus, the design of studies in human genetics interrogating a single sample per subject or applying lymphoblastoid cell lines may come into question. Sporadic disorders are common in medicine. We wish to stress the non-heritable genetic variation as a potentially important factor behind the development of sporadic diseases. Moreover, associations between post-zygotic mutations, clonal cell expansions and their relation to cancer predisposition are central in this context. Post-zygotic mutations are amenable to robust examination and are likely to explain a sizable part of non-heritable disease causality, which has routinely been thought of as synonymous with environmental factors. In view of the widespread accumulation of genetic aberrations with age and strong predictions of disease risk from such analyses, studies of post-zygotic mutations may be a fruitful approach for delineation of variants that are causative for common human disorders.

  • Genetics
  • Genome-wide
  • Clinical genetics
  • Complex traits
  • Copy-number

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/3.0/ and http://creativecommons.org/licenses/by-nc/3.0/legalcode

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Over the past three decades, projects in human genetics searching for genotype–phenotype correlations have mostly focused on analyses of the inherited genome. These include studies of genes causing monogenic disorders and more recent analyses of the association of complex diseases with single nucleotide polymorphisms (SNP) in genome-wide association studies (GWAS). The prevailing approach has been analysis of DNA from a single tissue (usually blood) sampled at a single time point (non-longitudinal sampling). The general foundation and rationale for these studies has been the assumption that the vast majority of cells in the human soma are genetically identical; in other words, that the genome of somatic cells is stable across the human lifespan. In this review we discuss recent findings that challenge this assumption1–3 and argue that post-zygotic changes represent an underestimated source of variation responsible for the development of human phenotypes. In recent years, the GWAS have dominated the human medical genetic landscape of complex diseases and have, notwithstanding their shortcomings, contributed to our knowledge of human genetics.4 They have improved our understanding of the genetic basis of many human traits, as >1200 variants associated with >165 different human traits and diseases have been described.4–8 However, to the chagrin of the field, the portion of the estimated heritability explained by the GWAS findings has been unexpectedly low. Many explanations have been proposed for the ‘missing heritability’ of complex traits, including human disease.4–8 Faced with the inefficiency with which inherited biology explains and predicts disease, we argue that the weight should shift to the non-inherited component which, until now, has routinely been thought of as synonymous with environmental factors.

Post-zygotic DNA sequence mutations, although known to occur in normal cells, were not considered to be a major factor behind common diseases, but recent evidence seriously challenges this belief.1–3 This review has been inspired by our results1 and two other papers supporting and extending our conclusions,2 ,3 showing an age dependent accumulation of post-zygotic mutations in non-tumoral cell lines constituting the human soma. Our focus is to highlight the importance of somatic mosaicism as a potentially crucial factor causing complex human diseases. According to a common metaphor ‘A beloved child is called many things’; the phenomenon that is discussed here has many names—for example, somatic mosaicism, somatic variation, post-zygotic changes, de novo variants, aberrations acquired during lifetime, and detectable clonal mosaicism. All these terms fall into a definition of mosaicism as the presence of genetically distinct lineages of cells in a single organism that is derived from the same zygote. We use here ‘post-zygotic variation’ or ‘post-zygotic mosaicism’ as unifying terms for all DNA changes acquired during life, from single base pair mutations to aberrations at the chromosomal level. The term ‘mosaicism’ was first used in biology in the end of the 19th century by W Roux and A Weismann to describe differential usage of genetic information during development. This incorrect explanation of mosaic development and ontogenetic differentiation became later known as the Roux-Weismann theory of qualitative nuclear division.9 More recently, in 1956 CW Cotterman used the term ‘somatic mosaicism’ to define antigenic variation.10

Post-zygotic mosaicism has been studied in human embryos,11 ,12 fetuses from spontaneous abortions,13 and children with birth defects or developmental delay.14 ,15 However, until recently,1–3 little has been known about post-zygotic mosaicism in human adult and aging but otherwise healthy individuals. This review does not focus on de novo mutations in the germline that are known to cause monogenic autosomal dominant and X-linked diseases, or those recently found to be part of the aetiology of neurodevelopmental diseases. For the latter, we refer to a recent review on this topic.16 Likewise, we do not discuss paternal age effect mutations and selfish spermatogonial selection in relation to various human disorders.17 There are two well known examples of physiological and locus specific post-zygotic variation in the nuclear genome. The first are somatic rearrangements of immunoglobulin (Ig) and T cell receptor (TCR) genes in B and T lymphocytes. The Ig and TCR genes are inactive in most cells, but undergo a tightly regulated reshuffling in order to become activated, which leads to individual B or T lymphocytes producing a mono-specific antibody or TCR, respectively.18 The second example is the variation of telomere length; a special case of structural post-zygotic change. The length of telomeres functions as a clock for the number of cell divisions, limiting the replicative capacity of cells, which is important for cell senescence, aging, and cancer.19–23 All other known examples of post-zygotic variation, which is a focus of this review, are apparently a result of stochastic, random processes.

An adult human body has been estimated to contain 1013–1014 cells and the number of cells produced during a human lifetime is assessed as more than 1016. Each somatic cell division is inherently coupled with a risk for mutations and there are estimates of the number of mutations that could be expected to arise during human life.24–26 We quote from Lynch 201026: “… with a human germ-line mutation rate of ∼10−8 base substitutions/site/generation, a site in a somatic nucleus will be mutated with a probability of 10−7 to 10−6 by the average age of reproduction, with the burden being higher in older individuals. With a diploid genome size of 6×109 sites and ∼1013 cells per soma, the body of a middle-aged human might then contain >1016 mutations (not including insertions, deletions, or other larger scale mutations). Only about 1% of the human genome consists of coding DNA, so a substantial fraction of somatic mutations will be inconsequential, but even if just 1% of coding mutations had significant fitness effects, the total body burden of mutations would be of order 1012”. The above numbers have been calculated based on studies of single nucleotide variants. It should be stressed that structural variants, although less well studied than single nucleotide polymorphisms (SNPs), are estimated to be more common. Comparisons of germline mutation frequencies of SNPs versus copy number variations (CNVs) indicate that the latter are more common by a few orders of magnitude.27 ,28 Furthermore, the base substitutional mutation rate per cell division in somatic cells is 4–25 times greater than the corresponding rate for germline (reviewed in Lynch25). Thus, the predicted burden of post-zygotic mutations in the human soma during a single lifetime is overwhelming.

Given this vast amount of expected variation, it is likely that a considerable part of these events have consequences for cellular phenotypes. However, for a phenotype to occur at the level of an organism, a mutation should strike a substantial number of cells, which are in an appropriate spatial and temporal window of development. It might be helpful to consider the above numbers using an analogy with Darwinian selection. During evolution of species, most new mutations are either disadvantageous to the organism (eliminated from the gene pool because of their negative effect on fitness) or are neutral passengers, not providing an advantage or disadvantage, and are therefore not leading to their relative increase in the gene pool. Only a minority of new mutations are propagated in following generations, by increasing the fitness of the affected organism and its progeny. Similar reasoning might be applied to the post-zygotic mutations within a human soma. It is likely that a large group of post-zygotic mutations are never detected because of their detrimental effect on the affected cell and its elimination by apoptosis/growth arrest. The phenotypically neutral passenger mutations are not easily studied either, since they are not increasing in the relative frequency of the affected cell clone over all other cells. The only mutations that are readily detectable are those providing the affected cell with a proliferative advantage and this has been known to be the main mechanism of tumorigenesis. The three recent studies1–3 show that this can also occur in lineages of normal cells in healthy individuals.

Recent findings on post-zygotic variation in phenotypically normal human cells

The papers that prompted this review1–3 are the latest contributions towards increasing awareness of post-zygotic variation as a widespread and easily detectable phenomenon with potentially important consequences for various human phenotypes.29–39 The three papers showed that normal cells accumulate structural aberrations with age, which are readily identified using genome scanning on SNP arrays. These structural changes fall into three major categories: deletions, gains, and copy number neutral loss of heterozygosity (CNNLOH, also called acquired uniparental disomy, aUPD) (figure 1). The size of these aberrations is highly variable, from a few kb to entire chromosomes. The relationship between age and mosaicism is strong and other tested co-variants, such as sex, ancestry, and smoking, did not have a significant effect on the mosaic status. A common thread in these reports is the detection of clonal expansions of blood cells that were affected with various aberrations, suggesting that these mutations convey a proliferative advantage for the cells carrying them. Forsberg et al1 showed the highest frequency of subjects affected with aberrations; that is, 3.4% of generally healthy people in the window of 55–90 years old show clones of nucleated cells containing megabase-range changes, which affect up to 60% of nucleated cells in blood. This number of ∼3% for mosaic mega-base range aberrations occurring among elderly/old subjects should be compared to ∼1% of mosaics for chromosomal aberrations described in a preselected cohort of children referred for clinical diagnostic testing.14 In addition, Forsberg et al1 showed, using a unique cohort of age stratified monozygotic twins sampled several times, that smaller structural aberrations (in the range a few kb) also accumulate with age, as they appear much more common in older subjects.

Figure 1

An illustration of the three main types of post-zygotic structural genetic aberrations, selected from Forsberg et al.1 Panels A, B and C display a deletion, a copy number neutral loss of heterozygosity (CNNLOH, also called acquired uniparental disomy, aUPD), and a gain, respectively. Each panel is composed of images from Illumina single nucleotide polymorphism (SNP) beadchips showing a selected aberrant chromosome, with the affected regions highlighted in pink. The results from Illumina SNP arrays consist of two data tracks: log R ratio (LRR) values of fluorescent intensities from array probes (upper part), and B allele frequency (BAF) values representing the fraction of fluorescent intensity at each SNP accounted for by the B allele (lower part). Normally, BAF values cluster around 0 (AA genotype), 0.5 (AB) or 1 (BB). On the right hand side, a schematic explanatory figure displaying the mosaic mixture of cells with aberrant and wild-type chromosomes is shown. Two hypothetical homologous chromosomes (labelled in green and white) with heterozygous genotypes for six SNPs are shown. Panel A shows data for chromosome 5 in a monozygotic (MZ) twin pair sampled at the age of 77 years. MZ twin TP25-1 has a normal profile, while its co-twin TP25-2 has a 32.5 Mb deletion on 5q in approximately 55% of nucleated blood cells. This deletion is uncovered using both LRR (downward shift) and BAF (heterozygous SNPs cluster away from 0.5) data from the Illumina SNP array. The Illumina profile contains a mixture of genotypes from aberrant cells (approximately 55%) and wild-type cells representing approximately 45% of nucleated blood cells. Panel B shows data for chromosome 10 in MZ twin pair sampled at the age of 77 years. Twin TP12-1 shows a normal profile. Using BAF values a 76.5 Mb large CNNLOH/aUPD was identified on 10q in co-twin TP12-2. Quantification of cells containing the CNNLOH/aUPD suggests that 34% of cells are affected. As this aberration does not change the copy number of the aberrant segment, LRR values are normal. However, the genotypes of SNPs within this segment are all homozygous in aberrant cells. Panel C shows data for chromosome 8 from subject ULSAM-298 using two samples collected at the ages of 71 and 88 years. The sample collected at 71 years shows a normal profile, while the sample taken at the age of 88 years shows a 70 Mb gain of chromosome 8 in approximately 30% of cells, visible with both LRR and BAF data from the Illumina SNP array.

Comparison between frequencies of the three main classes of mega-base range structural mutations showed that deletions are far more common than gains. Another prominent finding is the high frequency of CNNLOH/aUPD. Forsberg et al,1 Laurie et al2 and Jacobs et al3 reported that CNNLOH/aUPD represent 22%, 34% and 48% of all mutations, respectively. Different scoring algorithms might explain differences between these three studies. It should also be pointed out that in cases where only a few percent of cells are affected, it might be difficult to discriminate CNNLOH/aUPD from a gain or a deletion event. Nevertheless, CNNLOH/aUPD appears to be a major class of somatic mutations, either the most common or second most common in frequency among the mega-base range aberrations. The simplest definition of CNNLOH/aUPD in the context of a single affected chromosome is the presence of both homologues of a pair of chromosomes from one parent only.40 CNNLOH/aUPD can affect the entire chromosome or smaller segments (segmental CNNLOH/aUPD, terminal or interstitial) with stretches of homozygosity. CNNLOH/aUPD should be considered a special case of structural variation since it does not change the copy number of the affected segment. It is, however, a result of a structural rearrangement, most commonly due to meiotic or mitotic nondisjunction/anaphase lag, alternatively mitotic recombination. From the disease point of view, CNNLOH/aUPD could result in: (1) an imprinting disorder, via loss or doubling of the expression of an imprinted gene; or (2) expression of a recessive trait (eg, a mutation in a tumour suppressor gene) in a non-Mendelian fashion. The latter is mediated by reduction to homozygosity causing a recessive phenotype to appear, which is inherited in an initially heterozygous state from the parents. The list of conditions associated with CNNLOH/aUPD is continuously growing40–43 and this trend is likely to continue due to an increasing awareness and application of SNP based arrays with ultra-high resolution in analyses of normal and disease related samples. CNNLOH/aUPD cannot be detected by cytogenetic analyses or by standard array-CGH. However, allelic ratio values from SNP based arrays, such as Illumina beadchips, are sensitive tools for the detection of constitutional (non-mosaic) and mosaic forms of CNNLOH/aUPD.14 The detection of CNNLOH/aUPD should be discussed in the context of next generation, highly parallel sequencing, gradually revolutionising the field. This approach is neither straightforward (from the data analysis point of view) nor inexpensive for detection of CNNLOH/aUPD, especially for samples affected with low level mosaicism. Therefore SNP microarrays should remain the preferred approach for such analyses.

In addition to showing a high frequency of post-zygotic structural aberrations in normal cells, Forsberg et al1 also showed variable dynamics of cell clones affected with aberrations in different individuals, by studying 2–4 longitudinal samples collected many years apart from the same subject (figure 2). A more or less rapid relative increase in frequency of cells affected by a certain abnormality was observed in many cases and the rate of this increase varied between different subjects and different aberrations. Interestingly, in multiple subjects that were studied in longitudinal fashion, a decrease in the number of affected cells in the oldest samples was observed, which suggest a self-correcting process in the haematopoietic system. This decrease suggests that the initially expanding cell clones, possessing a higher proliferative potential, are not immortalised and follow the normal apoptotic programme. Furthermore, new blood samples from subjects that were studied longitudinally provided an opportunity for sorting blood cells into several sub-compartments, such as CD4 T cells, CD19 B cells, and granulocytes. In one illustrative subject (ULSAM-697), who is generally healthy, we described a >100 Mb CNN-LOH/aUPD of chromosome 4 using four time points: 71, 82, 88, and 90 years (figure 2). This aberration was not detectable at the age of 71 years, reached ∼58% at the ages of 82 and 88 years, and decreased radically to ∼30% of cells at the age of 90 years. Sorting of cells at the age of 90 years showed that CD4 T cells and granulocytes were affected to a similar degree, as identified in DNA from unsorted blood at the same age. However, CD19 B cells were unexpectedly free from this aberration. Thus, both myeloid and lymphoid lineages were affected to a similar degree, with the notable exception of B lymphocytes. It should be stressed that aberrations of 4q are typical for myelodysplastic syndrome (MDS), but this individual does not have any symptoms of the disorder. In addition, considering the rapid decrease of the cell clone carrying the aberrant 4q between samplings at 88 and 90 years, it is likely that this subject should soon be free from aberrant cells, which emphasises the self-eliminating property of the system. Moreover, all three reports1–3 observed a frequent coexistence of two (or more) aberrations in the blood of a single person. Longitudinal analyses of subjects showing multiple aberrations revealed variable dynamics of changes for different aberrations over time, pointing to the coexistence of different cell clones in blood, each affected with a distinct aberration.1

Figure 2

The whole genome profiles in longitudinal analysis of 4 peripheral blood samples collected from subject ULSAM-697 at the ages of 71, 82, 88, and 90 years (panels A, B, C and D, respectively). This figure illustrates a clonal cell expansion containing a terminal CNNLOH/aUPD encompassing 103 Mb of the long arm of chromosome 4, with an increase and a decrease in the number of cells at different ages (data from ref. [1]). Each panel is composed of images from Illumina SNP beadchips showing the BAF-values, as CNNLOH/aUPD is not detectable using LRR data (see Fig. 1). The estimated percentage of cells displaying CNNLOH/aUPD on chromosome 4 is shown for each studied sample. This aberration was not detectable at the age of 71, reached approximately 58% at the ages of 82 and 88 years and decreased radically to approximately 30% of cells at the age of 90 years. This figure also displays the BAF-profiles for the whole genome from genotyping of sorted blood cells (CD19+ B lymphocytes, CD4+ T lymphocytes, and granulocytes) as well as skin fibroblasts collected at the age of 90 years (panels E, F, G and H, respectively). Sorting of blood cells at the age of 90 years showed that CD4+ T-cells and granulocytes were affected to a similar degree, as identified in DNA from unsorted blood at the same age. However, CD19+ B-cells were unexpectedly free from this aberration. Thus, both myeloid and lymphoid lineages were affected to a similar degree, with the notable exception of B-lymphocytes. Panels I and J show statistical analysis of data. Panel I shows comparisons of “BAF-value deviation from 0.5” for heterozygous probes only and within the aberrant region of 4q, derived from analysis displayed in panels A through D. Similar analysis is shown in panel J for data derived from panels E through H. The proportion of cells with the 4q aberration changes with time and between different types of cells. These changes are significantly different between all samplings (ANOVA p<0.001; Tukey's test for multiple comparisons).

The results on expanding-contracting, potentially pre-cancerous clones, which are subject to auto-correction,1 are in good agreement with data showing expansions of pre-leukaemic clones containing gene fusions specific to acute leukaemia described in newborns.44 Thus, throughout the lifetime, peripheral blood likely contains multiple aberrant expanding and contracting cell clones and these can persist in circulation for many years, if not decades. This issue requires further studies and one intriguing question in this context is: which are the cells that are giving rise to these clonal expansions? We can only speculate that these might be very early progenitors for multiple lineages of haematopoiesis or perhaps even haematopoietic stem cells (HSC). Other interesting and related questions are: why do humans in the age window of 55–90 years develop so frequently post-zygotic aberrant cell clones, present at the high frequency (5–95% of all nucleated cells) in peripheral blood? In other words, why are such clonal expansions present in blood at much lower frequencies below the age of 55 years? One plausible explanation is related to immuno-senescence and accumulation of random mutations with age. Immuno-senescence involves loss of cell diversity in elderly/old subjects, preferentially in B and T cell lineages.45–48 This loss of diversity of clones might be caused by depletion of the complexity in the pool of HSC, due to detrimental mutations forcing the affected cells into apoptosis/growth arrest. The stem cells remaining in the pool also accumulate mutations with age, but these mutations might, on the contrary, be promoting their proliferation. As such a process gradually progresses with age, a threshold effect is reached and the frequency of aberrant clones rise above the detection limit of array based analyses, which is ∼5% of all nucleated blood cells.14 ,49

The results presented by Forsberg et al,1 Laurie et al2 and Jacobs et al3 likely represent only ‘the tip of an iceberg’ and there are many arguments supporting this assumption. Perhaps the strongest argument is derived from the above discussed predictions of the number and consequences of mutations that we can expect to develop within a single human soma. The largest category of post-zygotic mutations is likely never detected, if they are detrimental and lead to apoptosis/growth arrest of the affected cell(s). This category of mutations is probably largely responsible for the development of age related loss of diversity of cells in the human immune system, characteristic for the immuno-senescence.45–48 Another category of undetected mutations is phenotypically neutral, not leading to a sufficient proliferative advantage of affected cells, over all the other nucleated cells in the peripheral blood. Genetic events in this category are beyond the reach of array based analyses, but could be studied using the next generation sequencing with a deep coverage. Furthermore, another argument is related to the fact that we have so far only studied blood, which is quite special, compared to solid tissues. Extrapolation on the level of post-zygotic mosaicism beyond blood DNA using similar resolution of analysis is currently difficult. In addition, blood is composed of numerous cell types with discrepancies in their longevity and their natural rate of replenishment, but blood DNA is routinely studied without cell sorting. Much lower levels of mosaicism could be detected by analysing well defined subsets of blood cells, especially for cell clones representing a minority of circulating cells. This is essential for analyses of human disorders where a certain subset of cells (from blood or elsewhere) can be suspected as being important for the development of particular phenotypes. Moreover, the SNP arrays used1–3 interrogated only in the order of 0.4-1 million nucleotides with an uneven distribution of data points. This has important implications for a likely high false-negative rate of mutation discovery, especially for structural rearrangements below 50 kb in size. Finally, balanced inversions and translocations would have escaped detection by our method. Thus, future studies should be directed towards better defined subpopulations of cells using a considerably higher resolution approach. Whole genome sequencing would definitely suffice with regard to the resolution of analysis. However, this method is still expensive and is not established to analyse all types of mutations, especially when structural variation is considered. A recent comparative study using different sequencing platforms of a single genome at high coverage illustrated this notion.50 The concordance rate between two platforms was low; 88% for calling of single nucleotide variants and only 26% for indels. In summary, in order to see more of the iceberg, we should address a number of points discussed above.

Phenotypic relevance of post-zygotic mosaicism

Reports on mosaic mutations causing Mendelian and non-Mendelian conditions are continuously accumulating. A few recent examples of conditions associated with post-zygotic variation are: Proteus syndrome,51 different vascular anomalies,52 Ollier disease/Mafucci syndrome/metaphyseal chondromatosis,53 ,54 CLOVES syndrome (Congenital, Lipomatous, Overgrowth, Vascular malformations, Epidermal nevi and Spinal/Skeletal anomalies and/or Scoliosis),55 and congenital dyskeratosis.56 Post-zygotic mosaicism can result in a milder phenotype, can cause reversion of disease phenotype, or can unmask an expression of a mutation that would otherwise be lethal to the embryo. It is likely that many instances of post-zygotic mosaicism are not clinically recognised since the patient may show a borderline, mild clinical phenotype due to a low proportion of cells carrying a mutation. Another reason underlying the ascertainment bias is that post-zygotic variation is primarily relevant for sporadic cases (de novo mutations) with no previous family history of a disease. The steadily growing body of data indicates that somatic mosaicism for pathogenic mutations affecting known disease genes should be seen as a rule applicable to the vast majority of disease related genes, rather than as an exception. As comprehensive reviews on this subject are published,29–34 37–39 57–62 we will only discuss two well studied genes providing insights into the role of somatic mosaicism on the phenotype. Duchenne muscular dystrophy (DMD) is an X chromosome linked, lethal neuromuscular disorder, affecting one in 3500 liveborn males. The DMD gene shows interesting findings with regard to somatic mosaicism.63–65 Its mutation spectrum is atypical as up to 75% of DMD cases are due to structural rearrangements; that is, a deletion or duplication of one or more exons. This gene contains two mutational hot spots involving distal (exons 45–52) and proximal (exons 2–7) regions.66 There is a difference in the distribution of rearrangements within the gene in patients showing mosaicism versus non-mosaic cases. Deletions in patients showing somatic mosaicism are preferentially clustered around exon 2.67 ,68 This suggests that the mechanism behind generation of these structural rearrangements is different in mitosis versus meiosis. The third interesting aspect of the DMD gene is a reversion of disease phenotype in muscle fibres of DMD patients, via mitotic rearrangements restoring the reading frame and allowing some dystrophin expression to occur. In several cases, the reverting mutation appeared to be in the distal deletion hotspot, supporting the suggestion that this region is unstable. Somatic reversions have also been described for other diseases.31–33 ,37 ,56 ,69–71

Neurofibromatosis type 1 (NF1) is an inherited tumour syndrome caused by mutations in the NF1 gene on 17q.72–74 Approximately 5% of patients are affected by large (1.2–1.4 Mb) deletions removing NF1, along with other genes.75 ,76 Most of these large deletions are the result of non-allelic homologous recombination between segmental duplications, flanking the NF1 gene. In the important study by Kehrer-Sawatzki et al,75 mosaicism for the NF1 gene deletions was detected in up to 40% of cases, when sporadic NF1 patients were specifically targeted for analysis of deletions using DNA from several tissues. Mosaic patients also lacked the cognitive defects and facial dysmorphology typically associated with NF1 microdeletions, suggesting a genotype–phenotype correlation. In patients with mosaicism, the proportion of cells with the deletion was 91–100% in peripheral leucocytes, but was much lower (51–80%) in buccal smears or peripheral skin fibroblasts. Detailed analysis of the deletion breakpoints revealed additional surprising results. In contrast to the typical NF1 deletion of 1.4 Mb (occurring between the major segmental duplications flanking the gene, also known as type 1 deletions), seven of the eight mosaic deletions were 1.2 Mb in size (known as type 2 deletions) and were the product of recombination between the SUZ12 gene and a highly similar pseudogene.75 ,77 Thus, type I NF1 microdeletions occur by intra-chromosomal recombination during meiosis, while the type II deletions are mediated by intra-chromosomal recombination during mitosis. This scenario is reminiscent of the above described findings for the DMD gene, pointing again to a different mechanism behind the generation of some structural rearrangements in meiosis and mitosis. The NF1 gene can also be somatically mutated in human glioblastoma multiforme and leukaemia.78 ,79

The three papers1–3 have pointed out the cancer related aspect of clonal cell expansions in the blood of elderly/old individuals. Laurie et al2 and Jacobs et al3 showed that individuals affected by post-zygotic aberrations have a considerably increased risk of hematological malignancies/cancers, with the relative risk increasing 10- and 35-fold, respectively. These numbers are higher by at least an order of magnitude, compared to the risk estimates from GWAS.4 The report by Jacobs et al3 (see figures 2 and 3 in their article3) compared cohorts of cancer-affected and cancer-free subjects. The vast majority, if not all, of aberrations that were observed in the cancer-affected cohort were also seen in cancer-free subjects, although at lower frequency. A detailed inspection of the regions with aberrations is interesting when viewed in the context of the two most common hematological malignancies of the elderly, namely chronic lymphocytic leukaemia (CLL) and MDS. Numerous uncovered chromosomal aberrations in blood have previously been described in patients affected with these disorders, which suggests that these mutations are not cancer specific. They represent rather an early pre-cancerous change, possibly predisposing to the development of malignancy/cancer later in life, presumably after acquisition of additional mutations and further in vivo selection for clones with the highest proliferative potential.

It should be stressed that, considering the frequency of CLL and MDS in the general population, the majority of these discovered post-zygotic aberrations will not lead to a clinically manifested disease, reinforcing the issue of the self correcting haematopoietic system. A comparison of the total number of subjects affected with post-zygotic aberrations1–3 and the literature for CLL80–87 and MDS88–94 suggests that the number of mutations related to MDS is higher when compared with those relevant for CLL. The most commonly observed and MDS related changes are: 4q CNNLOH/aUPD (targeting the TET2 tumour suppressor gene)95; deletions of 5q and 5q-CNNLOH/aUPD; monosomy 7 and deletions of 7q (targeting the EZH2 gene); trisomy 8; deletions of 11q and 11q-CNNLOH/aUPD (targeting the CBL gene); monosomy 17, deletions of 17p and 17p-CNNLOH/aUPD; deletions of 20q; as well as trisomy 21. The corresponding list of aberrations related to CLL is: 11q deletions and 11q-CNNLOH/aUPD; trisomy 12; 13q deletions and 13q-CNNLOH/aUPD; monosomy 17, deletions of 17p and 17p-CNNLOH/aUPD as well as 22q deletions and 22q-CNNLOH/aUPD (possibly targeting the PRAME gene). This overrepresentation of MDS related aberrations may seem surprising since CLL is usually considered to be the more common malignancy of the elderly. However, this MDS biased portrait of post-zygotic aberrations is in agreement with studies showing that the aging of the human immune system is connected with the relative depletion of lymphoid precursors and an increase of the myeloid counterparts.

The human haematopoietic system undergoes a dramatic shift with age. This includes a reduced cellularity of the bone marrow,96 reduced lymphopoiesis,45 and a decreased complexity of T cells46 and B cells.47 Nevertheless, the frequency of HSC appear to be high in the elderly, although their developmental trajectories are changing from a lymphoid dominated developmental pattern in the young to a more myeloid dominated developmental pattern in the elderly.48 ,97 ,98 HSC from both the young and elderly had the potential to generate lymphoid and myeloid lineages in culture. However, HSC from the elderly individuals have a more myeloid biased differentiation potential as compared to HSC from young subjects.48 In line with this, mutations in the TET2 gene, which are frequently found in patients with MDS, were observed in the blood of phenotypically normal humans with clonal haematopoiesis.95 Thus, considering the above literature, we would argue that the age dependent shift between lymphoid and myeloid lineages mirrors well the picture of MDS and CLL related aberrations in the peripheral blood of elderly/old humans.

One of the intriguing questions raised in the recent papers1–3 is: which other phenotypes (other than hematological cancers and non-cancer related) can be linked to clonal cell expansions in blood harbouring different aberrations? Our results provide one illustrative example, regarding a non-cancer related hematological phenotype. One subject displayed a 20q deletion, which was barely detectable at the age of 71 years. The number of cells containing the 20q deletion was estimated to be ∼50% when he was 75 years old and he had ∼36% aberrant cells at the age of 88 years. In between the samplings at 75 and 88 years, he was diagnosed with idiopathic thrombocytopenic purpura, which might be due to clonal expansion of 20q deletion cells and suppression of normal thrombocyte production. In line with the above example, future studies aiming at correlations of phenotype with a better defined post-zygotic mutation profile should be informative.

Conclusions, open questions, challenges and opportunities

The three papers1–3 have raised a number of questions and challenges, but also point to opportunities in connection with future investigations of post-zygotic mutations. These studies suggest a likely and largely unexplored impact of post-zygotic variation on common human phenotypes, not necessarily restricted to cancer. Sporadic disorders, defined as a lack of similar cases among the closest relatives of an affected patient, are common in medicine. We therefore argue that studies of differences in the post-zygotic mutational profile of appropriate target cells, in comparison with other normal cells of the same patient, will be highly informative. The non-heritable causes of human disease have traditionally been ascribed to environmental factors. With few exceptions, however, such as smoking for lung cancer or alcohol for liver cirrhosis, specific identification of most of these factors has proven elusive for common multifactorial diseases and methodological breakthroughs likely to change this are nowhere in sight. Post-zygotic mutations are clearly not heritable, and cannot therefore explain the ‘missing heritability’. However, they might be a part of the non-heritable disease causality, which has, until now, been underestimated in importance and routinely ascribed to the environment. The new evidence discussed here strongly suggests that a sizeable part of the non-heritable causes of human disease can be ascribed to stochastic molecular events that are readily amenable to well established paradigms of analysis.

These recent results1–3 should also be discussed in the general context of aging, longevity and age associated diseases. Aging has been defined as a complex process of cellular senescence of adult tissues that results in compromised stress response, homeostatic imbalance, and elevated risk of disease.99 ,100 The dramatic rise of the human lifespan (by 20 years during the second half of the 20th century) is calling for more research focused on healthy aging and age associated conditions. This life extending trend is expected to continue worldwide, with an average human lifespan rising another 10 years by the year 2050.101 By itself, aging is the largest risk factor for the majority of common human disorders.102 Studies of aging human cohorts collected in the longitudinal fashion and using the approach described recently1–3 (ie, analysis of post-zygotic structural aberrations that are accumulating during lifetime) may be fruitful for uncovering mutations that are causative for many of common human disorders. It should be stressed that the result of Laurie et al2 and Jacobs et al3 indicate that CNV analysis of post-zygotic changes yields considerably stronger predictions of disease risk, when compared with typical results from germline variants discovered in GWAS.4 This is a strong argument in favour of the extension of analyses targeting post-zygotic variation. Finally, a possible consequence of the accumulation of post-zygotic aberrations is that some of the clonal cell expansions might actually entail an increased lifespan for people affected with them, via enhanced function of the immune system, which is possibly stretching over many years of life. This issue should also be investigated in further detail.

The recent literature provides a rough ‘post-zygotic variation baseline’,1–3 defining what can be expected when the bulk genome derived from all cells present in the peripheral blood is scanned in young/middle aged and elderly/old subjects. However, this portrait of post-zygotic variation is not necessarily representative for all cell clones in circulation (see above, discussion about subject ULSAM-697) (figure 2). We should gain more insight into post-zygotic variation across various ages, when the blood is sorted into at least a few cellular sub-compartments. We would argue that such analyses will yield important information with regard to another hidden layer of post-zygotic variation, which might be useful for genotype–phenotype correlations in conditions related to dysfunctions of the haematopoietic system; for example, autoimmune or other chronic inflammatory conditions. Furthermore, it is equally important to assess the level of post-zygotic variation in at least a few other human tissues across different age groups. These should preferably represent at least one non-mesodermal lineage of embryonic development, as the most popular sources of DNA from different human tissues (blood and fibroblasts) are both of mesodermal origin. In conclusion, a major consequence of the recent results is that a profile of variation in a single human tissue collected at one time point cannot be used as a surrogate representing a faithful portrait of variation present in other tissues nor in the same tissue throughout lifetime. In line with this, future studies of genetic but not inherited mechanisms behind sporadic complex diseases should be directed towards an analysis of the cells, which are presumed to cause the phenotype under investigation. Such an approach should maximise the success rate for uncovering a truly pathogenic variation.

One of the strengths of the recent analyses1–3 is that the studied cells had not been manipulated in vitro, providing a representative snapshot picture of a dynamic system taken at a certain age. In this context, a concern should be raised regarding the use of lymphoblastoid cell lines (LCLs) as a source of DNA for similar studies. LCLs are Epstein–Barr virus transformed B lymphocytes and are usually cultured in vitro for a prolonged time. LCLs are polyclonal in the beginning, and then become gradually oligoclonal and monoclonal after prolonged culturing.103 ,104 Thus, these cultured cells might acquire a new genotype, which was not present in the original B lymphocytes that gave rise to the LCL. Indeed, a recent analysis of one parent–offspring trio performed in the context of the 1000 Genomes Project showed that the majority of de novo mutations present in the LCL of the offspring was neither present in parents nor was it detectable in DNA derived from total peripheral blood DNA of the offspring.105 Another independent study has recently confirmed this conclusion.106 Accordingly, these de novo mutations were likely artefacts induced by in vitro culturing. An alternative unfavourable scenario is that cultured LCLs may conceal post-zygotic mutations. This is because the variation studied via LCLs is representative for only a fraction of B lymphocytes and the latter are a minority of all circulating cells in peripheral blood. Furthermore, it has been shown that cells affected by some chromosomal rearrangements are less efficiently cultured in vitro, when compared to normal euploid cells,107 ,108 which might lead to a selective removal of cells with a variant genotype. Thus, LCLs should be restricted for studies of genetic variation.

Forsberg et al1 showed that the post-zygotic genome of normal blood is dynamic. Peripheral blood likely contains throughout lifetime multiple aberrant expanding–contracting cell clones. The available data are still limited but suggest that such clones can persist in circulation of elderly/old people for a decade or more. The currently available results provide a clear link between these aberrant expanding–contracting clones and hematological malignancies/cancers. However, the frequency of subjects affected with aberrant clones typical for MDS or CLL, for example, is considerably higher than the frequency of these diseases in the general population. Thus, not all subjects containing the pre-cancerous clones will develop malignancy/cancer and it is important to follow up this topic with description of causative factors promoting the development of these diseases. Furthermore, we envisage that the genotype–phenotype relationships based on the presence of specific aberrant cell clones (in blood and in other tissues) will be expanded to non-cancer related phenotypes. The medical literature provides many examples of diseases related to the haematopoietic system with fluctuating disease course, with relapses or even self healing; for example, asthma, multiple sclerosis, Crohn's disease, and inflammatory bowel disease, to mention a few. It might be relevant to search for expanding–contracting cell clones with post-zygotic mutations in different cellular sub-compartments of blood in such patients. Furthermore, in order to exploit this line of research maximally, the human post-zygotic genomes of several tissues should be monitored in a longitudinal fashion, using samples collected at multiple time points throughout life. Such analyses will require modifications to the currently applied bio-banking procedures for sample collection from large population based cohorts and ethical approvals that justify such collections.

Acknowledgments

We thank Nick CP Cross, Maj Hulten, Richard Rosenquist Brandell, Eva Tiensuu Janson, Chiara Rasi, and Constantin Polychronakos for critical review of the manuscript and constructive comments. This work was supported by the Ellison Medical Foundation, Swedish Cancer Society, Swedish Research Council, the Science for Life Laboratory-Uppsala and Uppsala University.

References

Footnotes

  • Contributors All authors (LF, DA and JD) contributed to the conception and design, analysis and interpretation of data, review of the literature, and revising it critically for important intellectual content as well as the final approval of the version to be published. LF and JD wrote the manuscript.

  • Funding None.

  • Competing interests None.

  • Provenance and peer review Commissioned; internally peer reviewed.

  • Open Access This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/