Background Over 100 genes have been implicated in the aetiology of amyotrophic lateral sclerosis (ALS). A detailed understanding of their independent and cumulative contributions to disease burden may help guide various clinical and research efforts.
Methods Using targeted high-throughput sequencing, we characterised the variation of 10 Mendelian and 23 low penetrance/tentative ALS genes within a population-based cohort of 444 Irish ALS cases (50 fALS, 394 sALS) and 311 age-matched and geographically matched controls.
Results Known or potential high-penetrance ALS variants were identified within 17.1% of patients (38% of fALS, 14.5% of sALS). 12.8% carried variants of Mendelian disease genes (C9orf72 8.78%; SETX 2.48%; ALS2 1.58%; FUS 0.45%; TARDBP 0.45%; OPTN 0.23%; VCP 0.23%. ANG, SOD1, VAPB 0%), 4.7% carried variants of low penetrance/tentative ALS genes and 9.7% (30% of fALS, 7.1% of sALS) carried previously described ALS variants (C9orf72 8.78%; FUS 0.45%; TARDBP 0.45%). 1.6% of patients carried multiple known/potential disease variants, including all identified carriers of an established ALS variant (p<0.01); TARDBP:c.859G>A(p.[G287S]) (n=2/2 sALS). Comparison of our results with those from studies of other European populations revealed significant differences in the spectrum of disease variation (p=1.7×10−4).
Conclusions Up to 17% of Irish ALS cases may carry high-penetrance variants within the investigated genes. However, the precise nature of genetic susceptibility differs significantly from that reported within other European populations. Certain variants may not cause disease in isolation and concomitant analysis of disease genes may prove highly important.
- Genetic Epidemiology
- Motor Neurone Disease
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Amyotrophic lateral sclerosis (ALS) is a terminal neurodegenerative disease characterised by the degeneration of upper and lower motor neurons. Lifetime risk is approximately 1/4001 and in most instances an underlying cause cannot be established. Nonetheless, 22 genes have been implicated in Mendelian forms of the condition while a further 82 have been associated with disease risk (http://alsod.iop.kcl.ac.uk).2
A detailed understanding of disease heterogeneity is important to facilitate appropriate stratification of subcohorts for clinical research purposes. However, the relative importance of identified ALS genes is largely unknown and to date the most comprehensive studies have analysed only six or seven disease-associated genes.3 ,4 Investigation of the cumulative effect of variation across distinct disease loci has also been limited, although a significant excess in the co-occurrence of putative disease variants among fALS patients has recently been demonstrated.5 There is also little known as to how the genetic aetiology of ALS varies across populations. Previous studies have suggested significant variability in the importance of specific disease genes,6 ,7 but these studies have involved comparisons of selected patient cohorts and no comparative study of population-based cohorts has yet been performed.
To establish the relative and cumulative frequencies of disease variants across 33 of the most well-established ALS-related genes, we analysed a population-based cohort of 444 Irish ALS cases and 311 age-matched and geographically matched controls by multiplexed targeted high-throughput sequencing. This represents the most extensive survey of ALS loci to date. To assess the importance of disease heterogeneity across populations, we compare our results with those reported by previous population-based studies of major disease genes. We also search for correlations in the co-occurrence of putative disease variants and investigate the importance of cumulative susceptibility across distinct disease loci.
Materials and methods
All participating patients were recruited between 1999 and 2011 through the ALS Register of the Republic of Ireland or the ALS Register of Northern Ireland.8 All patients were of Irish ancestry and met the revised El Escorial criteria for possible, probable or definite ALS. Patients with an identifiable family history of ALS among first, second or third degree relatives were classified as ‘familial’, otherwise patients were classified as ‘sporadic’. Controls were neurologically normal at the time of blood donation and included the spouses of attending patients and volunteers recruited at primary care offices across the country. Informed written consent was obtained from all participants and the study was approved by the research and ethics committee in Beaumont Hospital, Dublin.
Indexed paired-end Illumina sequencing libraries10 were prepared for 444 cases and 311 controls. Libraries were enriched for the coding exons of target genes using custom SureSelect kits (Agilent, Santa Clara, California, USA) and resequenced at either TrinSeq (Dublin, Ireland) or GATC Biotech (Konstanz, Switzerland). Generated sequencing reads were aligned to the GRCh37 build of the human genome using Burrows-Wheeler Aligner (BWA) V.0.6.1.11 Subsequent quality control, depth of coverage analyses, power analyses, variant calling and variant annotation were performed using SAMtools V.0.1.18,12 the GATK V.2.1–2,13 Picard V.1.60 (http://picard.sourceforge.net/), Variant Effect Predictor V.2.7,14 Python V.2.7.3 (http://www.python.org/) and R V.2.14.1 (http://www.r-project.org/) along with the March 2012 release from the 1000 genomes project,15 the ESP6500 release from the NHLBI exome sequencing project (Exome Variant Server, NHLBI Exome Sequencing Project (ESP), Seattle, Washington, USA (URL: http://evs.gs.washington.edu/EVS/) (Accessed 18 July 2012)) and Ensembl 69.16 Further details on library preparation, target enrichment, sequencing and the analysis of sequence data are provided in the supporting materials and methods.
Unless otherwise indicated, all statistical analyses were conducted in R V.2.14.1.
Evaluation of high-penetrance disease models
One-tailed binomial tests were used to assess whether the frequencies of variant carriers within a series of control cohorts were higher than could be accounted for under high-penetrance disease models.17 These control cohorts included an Irish panel (internal controls), a European panel (internal controls, the ESP6500—European American cohort, the 1000 genomes—European cohort) and a global panel (internal controls, the full ESP6500 cohort, the full 1000 genomes cohort). The expected carrier frequencies were taken as the product of patient carrier frequencies and the respective population risks for ALS (Irish 1/290; European 1/397; Global 1/397).1 High-penetrance disease models were rejected when the p value associated with any one of the three cohorts was <0.05. Carriers were defined as individuals homozygous for the variant allele for the evaluation of recessive disease models and individuals homozygous/ heterozygous for the variant allele for the evaluation of dominant disease models.
Analysis of variant co-occurrence
One-tailed binomial tests were used to explore whether the frequencies of cases carrying multiple splice site/non-synonymous ALS gene variants exceeded chance expectation. As per van Blitterswijk et al,5 the expected frequency of this occurrence was taken as (the number of cases carrying ≥1 variant/the total number of cases)*(the number of controls carrying ≥1 variant/ the total number of controls). Variants were excluded from these analyses, if the frequency of carriers among European/Global controls from the ESP6500 and 1000 genomes projects exceeded one of several potential critical values (see online supplementary tables S3 and S4 for details).
The frequency with which carriers of previously reported ALS variants also carried variants of additional Mendelian disease genes (3/43) was used to estimate the probability of observing multiple Mendelian disease gene variants among all identified carriers of a given ALS variant (ie, 33/33 C9orf72 hexanucleotide expansion carriers OR 2/2 TARDBP:c.859G>A(p.[G287S]) carriers OR 2/2 FUS:c.1574C>T(p.[P525L]) carriers).
Analysis of population heterogeneity
The burden of putative disease variants across ANG, C9orf72, FUS, OPTN, SOD1 and TARDBP within the Irish and Italian ALS populations was compared first using c-alpha tests where singleton variants were collapsed into a single per locus count18 and second using allele count-based Fisher exact tests. To avoid artificial inflation of population differences due to biases in missing genotype rates, c-alpha test permutations were performed at each variant site independently.
Single variant association testing
Variants were tested for association with case–control status using PLINK V.1.07.19 Allelic, dominant and recessive disease models were tested using Fisher exact tests. Multiple testing correction was performed by the Westfall and Young permutation method. Synonymous variants were not considered in the correction of p values obtained from non-synonymous or splice site variants. Association tests were performed with and without sample filtering based on status for established disease mutations and a family history of ALS.
Four hundred and forty-four Irish ALS patients (57.7% men; mean age of onset=61.7±12.0 years; 65.4% spinal onset, 31.8% bulbar onset, 2.8% generalised onset) and 311 age-matched and geographically matched controls (47.2% men; mean age of sampling = 60.3±11.8 years) were included in the study. Fifty-three patients (11.9%) were recruited through the population-based Northern Irish ALS Register (Belfast, Northern Ireland) while the remaining 391 (88.1%) were recruited through the Irish ALS Register.8 Fifty patients (11.3%) were classified as ‘familial’ while 394 (88.7%) were classified as ‘sporadic’ according to recently published criteria.9 Twenty-four of the sporadic (6.1%) and 15 of the familial (30.0%) patients previously tested positive for pathogenic expansions of the C9orf72 hexanucleotide repeat.20
Study participants were screened for variants within the coding exons of 10 genes previously associated with Mendelian forms of ALS (ALS2, ANG, C9orf72, FUS, OPTN, SETX, SOD1, TARDBP, VAPB, VCP) and 23 low penetrance/tentative ALS genes (ATXN2, CHMP2B, DCTN1, DPP6, ELP3, FGGY, FIG4, GRN, HFE, IFNK, ITPR2, MAPT, MOB3B, NEFH, NIPA1, PARK7, PON1, PON2, PON3, PRPH, SIGMAR1, SPG11, UNC13A). At the time of target gene selection, the causative variant within the chromosome 9p linkage region had not been resolved and accordingly MOB3B and IFNK were included as tentative ALS genes.21 Given the large number of study participants and target exons involved, samples were resequenced using a multiplexed targeted high-throughput sequencing strategy.22 In total, 26.3×106 sequencing reads mapping to target positions were generated. Ninety-nine per cent of target bases were covered by at least 1 sequencing read and on an average each target position was covered by a mean of 27.3 sequencing reads per sample (see online supplementary table S1). Power to observe any variant with a patient allele frequency ≥0.5% was estimated to range from 0% to 98.8% across target positions with a median of 98.7% (IQR: 98.3–98.8%, figure 1A). A more detailed account of the distribution of variant detection power across target genes can be found in online supplementary figure S1. Four hundred and seventy-seven potential sequence variants were identified across target intervals (see online supplementary table S2). Ninety-five of these were designated as possible machine errors (see online supplementary materials and methods and table S2) and are not considered further. To evaluate the accuracy of genotypes inferred across the remainder of sites, genotypes called for 567 samples across 85 variants (target intervals ±50 bp) were compared with genotypes previously ascertained using Illumina HumanHap 550,23 Illumina Human610-Quad (deCODE Genetics, Reykjavik, Iceland) and Illumina OmniExpressExome-8v1 (Atlas Biolabs, Berlin, Germany) single nucleotide polymorphism BeadChip assays. This comparison revealed a genotype concordance rate of 98.9% among sequencing and BeadChip calls following genotype quality control (figure 1B, see online supplementary materials and methods).
Based on the frequencies of carriers among internal controls and samples analysed by the NHLBI exome sequencing and 1000 genomes projects, all but 52 sequence variants were excluded as causing ALS with high penetrance (see the Methods section). Two of these represented previously described ALS variants; TARDBP:c.859G>A(p.[G287S])24–26; FUS:c.1574C>T(p.[P525L]).27 The remainder included 15 synonymous variants, 1 splice site variant (DCTN1:c.2887–2A>G), 20 missense variants classified as ‘deleterious’ by SIFT28 or ‘possibly/ probably damaging’ by PolyPhen29 and 14 missense variants classified as ‘tolerated’ by SIFT and ‘benign’ by PolyPhen. For the purpose of this study, all but the 15 synonymous variants were regarded as potentially disease causing. Seventy-six patients (38.0% of fALS, 14.5% of sALS, 17.1% of combined) carried either one of these potential disease variants or a previously described ALS variant, with 57 (34.0% of fALS, 10.2% of sALS, 12.8% of combined) carrying variants of Mendelian disease genes (table 1. C9orf72 39;SETX 11; ALS2 7; FUS 2; TARDBP 2; OPTN 1; VCP 1. ANG, SOD1, VAPB 0), 21 (6.0% of fALS, 4.6% of sALS, 4.7% of combined) carrying variants of low penetrance/tentative ALS genes (table 2. SPG11 7; ELP3 3; CHMP2B 2; DCTN1 2; MAPT 2; DPP6 1; FGGY 1; HFE 1; ITPR2 1; PON2 1; UNC13A 1. ATXN2, FIG4, GRN, IFNK, MOB3B, NEFH, NIPA1, PARK7, PON1, PON3, PRPH, SIGMAR1 0) and 43 (30.0% of fALS, 7.1% of sALS, 9.7% of combined) carrying a known ALS variant (table 1. C9orf72 39; FUS 2; TARDBP 2).
Both patients carrying the TARDBP:c.859G>A(p.[G287S]) substitution were sporadic and presented with bulbar onset disease at 66/67 years of age. One patient remains alive at 51 months from disease onset, while the other died 49 months from disease onset. Both patients were cognitively intact. Both carriers of the FUS:c.1574C>T(p.[P525L]) substitution were also sporadic. They exhibited an exceptionally young age of onset (13/21 years) and rapid disease progression (disease duration: 11–17 months). One patient experienced spinal onset disease while the other experienced bulbar onset. Both were cognitively intact. A detailed account of the phenotype exhibited by C9orf72 repeat expansion carriers has been provided previously.20 Further details on the phenotypes of all patients determined to carry known or possible disease variants are listed in online supplementary table S5.
Co-occurrences of ALS gene variants
Analysis of the number of cases carrying multiple rare or low-frequency variants revealed no detectable excesses across either the Mendelian genes alone or the entire dataset (see online supplementary tables S3 and S4). Seven patients (4.0% of fALS, 1.3% of sALS, 1.6% of combined) carried two variants classified as known or potential ALS variants in the previous analysis (online supplementary table S5). In the case of four of these individuals, both variants fell within Mendelian disease genes. Notably, these four individuals included both identified carriers of the TARDBP:c.859G>A(p.[G287S])) variant, who were observed to also carry either an ALS2:c.2566A>G(p.Thr856Ala) or an SETX:c.814C>G(p.His272Asp) substitution. The probability of such an observation for all carriers of any previously reported ALS variant was estimated to be less than 1%, suggesting that these co-occurrences may be pathologically meaningful. Other observed co-occurrences of putative disease variants included ALS2:c.2098A>G(p.Thr700Ala) and SETX:c.7682C>T(p.Ser2561Leu), the C9orf72 repeat expansion and CHMP2B:c.123G>T(p.Gln41His), the C9orf72 repeat expansion and SETX:c.2842C>A(p.Pro948Thr), the C9orf72 repeat expansion and SPG11:c.3680A>G(p.Lys1227Arg), the DCTN1:c.2887-2A>G splice acceptor site variant and SPG11:c.1529G>A(p.Ser510Asn). Further details on these patients can be found in online supplementary table S5.
Variation in the frequency of ALS variants across populations
To assess the potential importance of genetic heterogeneity across populations, we compared the estimated frequencies of ANG, C9orf72, FUS, OPTN, SOD1 and TARDBP disease variants among Irish patients with those reported by population-based studies of Italian cohorts.3 ,4 The difference was statistically significant (combined p=1.7×10−4, table 3), supporting a correlation between genetic susceptibility and population of origin. Of the 32 variants analysed, only the C9orf72 repeat expansion and the FUS:c.1574C>T(p.[P525L]) substitution were observed among both Irish and Italian patients. The C9orf72 expansion was significantly more common among Irish patients (8.78% vs 4.39%, p=3.95×10−4) while SOD1 and TARDBP variants were significantly more common among Italian patients (SOD1: 2.00% vs 0.00%, p=3.8×10−3. TARDBP: 2.00% vs 0.45%, p=0.035). The overall frequencies of FUS and OPTN variants were similar (FUS: 0.30% vs 0.45%, p=0.61. OPTN: 0.20% vs 0.23%, p=1). ANG variants were identified only among Italian patients but the frequency difference was not significant (0.30% vs 0.00%, p=0.56).
Single variant association testing
Case–control association tests were performed under additive, dominant and recessive disease models and under various sample and variant inclusion criteria; however, no significant associations with disease risk were observed.
We have screened a population-based cohort of 444 Irish ALS patients and 311 age-matched and geographically matched controls for variants within the coding exons of 33 previously reported ALS genes. This represents the most extensive survey of known ALS loci to date and is the first to employ a multiplexed targeted next-generation sequencing strategy to efficiently analyse multiple ALS loci simultaneously. The resulting dataset exhibited high sensitivity in terms of predicted power to identify rare patient variants and high accuracy in terms of genotype assignment.
We found that up to 17.1% of Irish ALS cases (38.0% of fALS, 14.5% of sALS) may carry high-penetrance disease variants within the investigated genes. However, only 10 of the 33 genes analysed represent well-established Mendelian disease genes, and it is anticipated that many of the possible disease variants identified will not be ALS related. Additionally, it should be noted that variants of ALS2 and SPG11 were observed solely in heterozygous configurations, but that these genes have previously been associated with ALS only under recessive disease models.30 ,31 It is also worth noting that as reported in previous studies,32 no disease variants could be identified within the coding sequence of the C9orf72 gene.
A total of 9.7% of the patients (30% of fALS, 7.1% of sALS) were found to carry previously reported ALS variants, with 8.78% carrying the C9orf72 repeat expansion, 0.45% carrying the FUS:c.1574C>T(p.[P525L]) substitution and 0.45% carrying the TARDBP:c.859G>A(p.[G287S]) substitution. The phenotype of FUS:c.1574C>T(p.[P525L]) carriers was consistent with that reported previously,27 with both patients exhibiting an exceptionally young age of onset and rapid disease progression. Conversely, we observed that carriers of the TARDBP:c.859G>A(p.[G287S]) variant exhibited a disease of comparatively late onset and slow progression. Review of TARDBP:c.859G>A(p.[G287S]) carriers reported across this study and previous publications24–26 revealed that disease onset ranged from 52 to 70 years of age while disease duration ranged from 49 to ≥93 months.
Despite prior evidence to support models of high disease penetrance, 62% of C9orf72 expansion carriers and all FUS:c.1574C>T(p.[P525L]) and TARDBP:c.859G>A(p.[G287S]) carriers were classified as sporadic. Modelling of the effects of penetrance and family size on the rate of familial disease has previously shown that inheritance of high-penetrance variants can occur in the absence of a detectable family history.33 Conversely, the presence of a family history does not necessarily infer the presence of a common disease aetiology9 and it is not always the case that disease variants are inherited. For example, the FUS:c.1574C>T(p.[P525L]) substitution has been reported to occur as a de novo event in multiple cases.27 Taken together, our results therefore support the contention that the distinction between familial and sporadic disease is of limited utility from both clinical and research perspectives.
Substantial differences have been described in the general spectrum of rare and low-frequency genetic variations across populations.15 ,34 A degree of population differentiation would therefore also be anticipated in the genetics of disease pathogenesis. Formal comparison of our results with those reported by studies of major disease genes in Italy confirmed that this is the case with ALS. We observed significant differences in the nature of genetic susceptibility across the two populations (p=1.7×10−4), finding that only two of the 32 variants identified among either Irish or Italian patients could be identified among both. One of these shared variants was the FUS:c.1574C>T(p.[P525L]) substitution, which is notable as limited population differentiation may have been predicted a priori, given the associated age of mortality and the importance of de novo occurrence. The other was the C9orf72 repeat expansion, which has been shown to occur with high frequency among various European populations.3 ,5 ,35 However, we observed that the expanded allele occurred with a significantly higher frequency in the Irish population than the Italian (Irish=8.78% , Italian=4.39%, p=3.95×10−4). Conversely, we observed that variants of SOD1and TARDBP were significantly more common among Italian patients than Irish (SOD1:Irish=0.00%, Italian=2.00%, p=3.8×10−3. TARDBP:Irish=0.45%, Italian=2.00%, p=0.035). The absence of SOD1 variants in the Irish population is particularly striking, as the gene is believed to make significant contributions to disease burden across Scandinavia (9.6%),36 the USA (7.5%),3,7 Germany (12%—familial cases only),38 Italy (2.1%),3 France (56%—familial cases only)39 and Korea (3.9%).7
A recent analysis of ALS patients from the Netherlands revealed that mutations of multiple ALS-associated genes could be identified in 9/57 families (p=1.57×10−7),5 suggesting that oligogenic susceptibility may play an important role in ALS aetiology. In the current study, we searched for excesses in the co-occurrence of rare and low-frequency variations across a wider panel of disease genes. Our findings did not reveal any significant deviations from chance expectation, even when the analysis was restricted to genes analysed in the Dutch study (data not shown). However, we did observe that 1.6% of patients (4.0% of fALS, 1.3% of sALS) carried multiple variants classified as known or potential high-penetrance ALS variants and that this included all identified carriers of one previously reported ALS variant (p<0.01). This variant occurred within the TARDBP gene (TARDBP:c.859G>A(p.[G287S])), which is noteworthy as the strongest evidence for oligogenic-based disease in the Dutch analysis also related to a TARDBP variant (c.1055A>G(p.[N352S])). TARDBP:c.859G>A(p.[G287S]) has previously been identified among patients from Italy,24 France26 and the UK25 but has not yet to our knowledge been reported in healthy controls. As studies that have previously reported the variant analysed the TARDBP gene in isolation, it is not known whether carriers identified elsewhere also carry additional disease variants. The potential relevance of oligogenic susceptibility in ALS aetiology means that the exploration of oligogenic models may become increasingly important in disease gene mapping and studies of genotype–phenotype correlations. It also means that known disease genes should be studied together rather than in isolation and that patients testing positive for disease variants at one locus should not be excluded from subsequent analysis of additional disease loci.
No associations were established between any of the identified variants and case–control status. However, power to evaluate the pathogenicity of low-frequency variants was limited and further investigation of patients and matched controls by future studies may reveal disease-relevant effects. We have therefore provided a complete account of the variation identified across both cases and controls (see online supplementary table S2).
Strong correlations have been observed between patient phenotypes and individual disease variants20 ,27 and it may be the case that other clinical features such as the frequency of cognitive impairment, the burden of disability and drug response also vary across populations. While the clinical phenotype within European populations is broadly similar, our data would suggest that general extrapolation of findings from individually characterised ancestral populations must be undertaken with caution. A deeper understanding of disease heterogeneity across populations will require further analysis of representative patient cohorts, and is likely to be of benefit in cohort stratification for future clinical trials.
In conclusion, we have used targeted high-throughput sequencing to conduct an extensive population-based survey of ALS gene variant frequencies. We found that 17.1% of Irish cases may carry high-penetrance disease variants within the investigated genes, with previously established disease variants accounting for up to 9.7%. We have also found that the C9orf72 hexanucleotide repeat expansion represents the most common of these variants. Our study was limited by the exclusion of more recently reported disease genes like SQSTM140 and UBQLN241 and by the absence of any functional analyses of putative disease variants. However, we identified significant differences in the frequencies of disease variants between Irish and other European populations, demonstrating that distinct patient populations cannot always be treated as homogenous. Finally, we also uncovered evidence that supports the potential relevance of oligogenic susceptibility in ALS aetiology and suggests that the TARDBP:c.859G>A(p.[G287S]) variant may not cause disease in isolation.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Files in this Data Supplement:
- Data supplement 1 - Online supplement
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint first authors and that the last two authors should be regarded as joint last authors.
Contributors KPK: study design. Preparation of genomic sequencing libraries and target enrichment. Alignment of sequence data and related quality control. Variant calling and related quality control. All statistical analyses, drafting of manuscript. RLM: study design. Preparation of genomic sequencing libraries and target enrichment. Alignment of sequence data and related quality control, drafting of manuscript. SB: study design. Investigation of patient family histories. ME: cognitive testing of patients. MH: data management and retrieval from the Irish ALS Register. EMK: next-generation sequencing and related quality control at TrinSeq. PC: next-generation sequencing and related quality control at TrinSeq. DWM: next-generation sequencing and related quality control at TrinSeq. CGD: provision of DNA and clinical details relating to patients from Northern Ireland. DGB: study design and supervision, editing of manuscript. OH: principal investigator, study design and supervision, patient collection and phenotyping, director of the ALS Register, drafting and editing of manuscript.
Funding This work was supported by the Health Seventh Framework Programme (FP7/2007–2013) under grant agreement n° 259867, the Irish Health Research Board and the charities Trinity Foundation and Research Motor Neurone. Next-generation sequencing was performed in TrinSeq (http://www.medicine.tcd.ie/sequencing), a core facility funded by Science Foundation Ireland (SFI) under Grant No. [SFI/07/RFP/GEN/F327/EC07] with support from the Trinity Centre for High Performance Computing.
Competing interests None.
Ethics approval Research and Ethics committee in Beaumont Hospital, Dublin.
Provenance and peer review Not commissioned; externally peer reviewed.
Open Access This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/