Introduction

Rolandic epilepsy (RE) (MIM 117100) is a neurodevelopmental disorder, affecting 0.2% of the population. It is characterized by classic focal seizures that recapitulate the functional anatomy of the vocal tract, beginning with guttural sounds at the larynx, sensorimotor symptoms then progressing up to the tongue, mouth and face, culminating with speech arrest. Seizures most often occur in sleep shortly before awakening. The disorder occurs more often in boys than in girls (3:2) and is diagnosed in one in five of all children with newly diagnosed epilepsy.1 All patients exhibit the defining electroencephalographic (EEG) abnormality of centrotemporal sharp waves (CTS). The onset of seizures in childhood (3–12 years)2 is frequently preceded by a constellation of developmental deficits including speech disorder, reading disability and attention impairment. These deficits have been noted to cluster in family members of RE patients who do not have epilepsy.3, 4 None of these abnormalities are associated with major cerebral malformations visible on routine MRI.5 The seizures and the EEG abnormality of CTS spontaneously remit at adolescence, although the prognosis for developmental deficits is less clear. There is no known involvement of organs outside the nervous system.

RE belongs to a family of idiopathic epilepsies of childhood with focal sharp waves, some of which are characterized by more severe and varied types of seizures (atypical benign partial epilepsy or ABPE, MIM 604827), variable locations (benign occipital epilepsy, MIM 132090), acquired receptive aphasia (Landau–Kleffner syndrome, MIM 245570) and developmental regression (continuous spikes in slow-wave sleep). CTS are common in children (2–4%),6 have equal gender distribution and have been observed with increased frequency in developmental disorders, including speech dyspraxia,7 attention deficit hyperactivity disorder (ADHD)8 and developmental coordination disorder (DCD),9 suggesting that the EEG trait of CTS is not specific to epilepsy, but possibly a marker for an underlying subtle but more widespread abnormality of neurodevelopment.10

Despite the strong clustering of developmental disorders in RE families, RE itself has a low sibling risk of ∼10%.11 Several rare, phenotypically distinct Mendelian RE variants have been reported12, 13, 14, 15 but the common form appears to have complex genetic inheritance. However, segregation analysis suggests that CTS in the common form of RE is inherited as an autosomal dominant trait.16 CTS were reported to link to 15q14 in a candidate gene study of families multiplex for RE and ABPE, but this locus has not been replicated and no genome-wide screen for CTS has been previously attempted.17 Understanding the mechanism of CTS could provide insight into the variety of common neurodevelopmental disorders in which CTS are observed. We therefore set out to genetically map the CTS trait in RE families.

We conducted genome-wide linkage analysis of the CTS trait in 38 US families singly ascertained through an RE proband. In 11 of the families, one additional sibling was known to carry the CTS trait, but the CTS status of individuals younger than 4 years or older than 16 years was unknown because of its age-limited expression. The maximum two-point and multipoint LOD scores for CTS were observed at 11p13. We designated a 13-cM linkage region, encompassing the area in which LOD scores >2.0, as our region of interest for fine mapping. We then tested for association of CTS with SNP markers distributed across genes in this region. We initially used a ‘discovery’ data set that included 68 cases and 187 controls group matched for ancestry and gender – 38 of these cases were included in the original linkage screen. In addition to case–control analysis, we used family-based analysis to guard against the potential for positive confounding due to population stratification. We took a pure likelihood approach to the statistical analysis of linkage and association18, 19, 20, 21 that is explained in Methods. We then typed additional SNPs around genes that showed compelling evidence of association in the preliminary analysis. In a second, independent, ‘replication’ case–control data set, we typed a subset of the SNPs in our region of interest. The replication set included 40 RE cases and 120 controls from Western Canada; the two data sets were then jointly analyzed.

Methods

Subjects

Informed consent was obtained from all participants using procedures approved by institutional review boards at each of the clinical research centers collecting human subjects. The general methodology for the study has been detailed elsewhere.3 Briefly, cases with classic RE and their families were recruited for a genetic study from eight pediatric neurology centers in the northeastern United States (see Acknowledgements for referring physicians). Ascertainment was through the proband, with no other family member required to be affected with RE. All cases were centrally evaluated by a pediatric neurologist, as well as by one other study physician. Cases were enrolled if they met stringent eligibility criteria for RE, in accordance with the definition of the International League Against Epilepsy22 including (i) at least one witnessed seizure with typical features: nocturnal, simple partial seizures affecting one side of the body or on alternate sides; (ii) oro-facial-pharyngeal sensorimotor symptoms, with speech arrest and hypersalivation; (iii) age of onset between 3 and 12 years; (iv) no previous epilepsy type; (v) normal global developmental milestones; (vi) normal neurological examination; (vii) at least one interictal EEG with CTS and normal background, verified by two independent and blinded readers;16 and (viii) neuroimaging read by two independent and blinded board-certified neuroradiologists that excluded an alternative structural, inflammatory or metabolic cause for the seizures.5 Thus, cases with unwitnessed episodes or with only secondary generalized seizures were excluded, even if the EEG was typical. Siblings between the ages of 4 and 15 years underwent sleep-deprived EEGs to assess their CTS status;16 EEGs were then evaluated blind to identity by two independent experts.

Cases had their first seizure at a median age of 8 years (range 3–12); most had less than 10 lifetime seizures; over a third had at least one secondary generalized seizure, but only two had a history of convulsive status epilepticus; and two-thirds had been treated with antiepileptic drugs. Table 1 shows the seizure characteristics of the cases. Cases were 60% male and 76% of European ancestry (see Supplementary Information, Supplementary Table 1). Details of EEG and imaging findings have been reported earlier.5, 16 Affectedness data and DNA were collected from all potentially informative and consenting relatives of the proband. In most cases, this included at least both parents and all siblings over the age of 3 years.

Table 1 Clinical descriptors of RE cases

One hundred and eighty-seven controls were recruited from the same geographic locations as the cases and were group matched for gender and ancestry (see Supplementary Information, Supplementary Table 1). Each potential control was screened for personal and family history of neuropsychiatric and developmental disorders: DNA from individuals with a history of seizures was excluded from the control panel. The lifetime CTS status of controls was unknown because of their developmental expression, but assumed to be representative of the general population6 that is 2–4%; thus any observed association in case–control analysis should be conservative. This sample of independent cases and controls is referred to throughout as the discovery data set.

Forty cases and 120 controls were recruited from Calgary, Canada, according to the same eligibility criteria as in the discovery data set (see Supplementary Information, Supplementary Table 1), as a replication sample. The cases were 56% male and 83% of European ancestry, with median age of seizure onset at 7 years. Controls were also 56% male and 86% of European ancestry. Information regarding personal and family history of neuropsychiatric and developmental disorders was collected as above for possible exclusion from case–control analysis. The Calgary sample is referred to as the replication data set.

DNA collection

DNA was collected either by peripheral venous blood draw into 10 ml K-EDTA tubes (Fisher Scientific) or by salivary sample in ORAGENE (DNA Genotek, Ottawa) flasks.

STR genotyping

One hundred and ninety-four individuals from 38 RE families were genotyped using the deCODE 4 cM STR marker panel. This panel contains approximately 1200 highly polymorphic STR markers. Amplified fragments were electrophoresed using ABI 3700 and ABI 3730 DNA analyzers with CEPH family DNA used as control. Alleles were checked for consistency with Hardy–Weinberg equilibrium and non-paternity. Errors were reconciled by resampling or by excluding genotypes.

Linkage analyses

Two-point and multipoint heterogeneity LOD scores were calculated in all 38 families combined. We used the MMLS approach to parametric linkage analysis,23 which recommends a robust set of parameter values and an analytical approach.24, 25 Briefly, one calculates LOD scores under both dominant and recessive modes of inheritance, specifying a dominant gene frequency of 0.01 and a recessive gene frequency of 0.14, a sporadic rate of 0.0002 and a penetrance of 0.50. In regions providing evidence for linkage, we then maximized over a grid of penetrance values from 0 to 1.0 by 0.05 increments. Marker allele frequencies were calculated from the data set. We followed up markers with two-point LOD scores greater than 2.0 with multipoint analysis using Genehunter,26 again using MMLS but maximizing over penetrance and computing heterogeneity LOD scores. We used a sex-averaged map because the observed multipoint LOD scores should be conservative in the presence of linkage, if indeed there are male–female map differences.27 Simulation results confirmed that differential male–female map distance has little effect on localization of the maximum LOD score (data not shown). Separate analyses were conducted in the European and non-European ancestral subgroups.

SNP markers

We first typed polymorphic SNP markers in the 11p13 linkage region, delimited by a LOD score of 1.0 on either side of the multipoint linkage peak. Thirty-six markers were distributed predominantly within known genes using Tagger, implemented in Haploview,28 with r2=0.8; eight additional SNPs were typed in the region of ELP4 and PAX6 where there was evidence of association. The 44 SNPs (Table 2) were placed in and between ESTs and genes annotated in Ensembl Release 46, from downstream to upstream (see Figure 2): DCDC5, DCDC1, DPH4, IMMP1L, ELP4 and PAX6 between 30 819 214 and 31 780 205 bp (NCBI Build 36). In the replication data set, we typed a subset of 30 SNPs spanning 31 252 249–31 772 472 bp.

Table 2 SNPs genotyped in this study
Figure 2
figure 2

Cochrane–Armitage trend test of case–control association for CTS at the 11p13 locus in the discovery (New York) data set: Bonferroni critical value line displayed; significance criteria of 0.05/44 in discovery set corresponding to the 44 SNPs evaluated in the analysis.

SNP genotyping

Genotyping was performed on the Nanogen platform at deCODE Genetics (Iceland). SNPs were analyzed by end-point scatter plot analysis utilizing the ABI 799HT Sequence Detection System. Sixty-eight cases, parents of 38 cases and 118 controls were successfully typed from the discovery set; all 38 cases and 138 controls were typed from the replication set. Only one SNP, rs10835810, had >5% missingness (30% missing rate, similar in cases and controls), and only rs2863231 was out of Hardy–Weinberg equilibrium in controls at the 0.001 level. All except two SNPs (rs1223118 and rs288458) had a minor allele frequency >0.15 (Table 2). For resequencing methods see Supplementary Information.

Association analysis

Pure likelihood vs frequentist analysis

We conducted a pure likelihood analysis of the SNP data18, 19 as well as calculating standard frequentist P-values for comparison. The two methods, in theory, provide the same ordering of importance for SNPs. However, they have different significance thresholds, different sample size requirements and different approaches to the adjustment for multiple hypothesis testing. We used pure likelihood analysis to determine our SNPs of interest for follow-up. We provide P-values for those unfamiliar with pure likelihood analysis, for comparison only. A pure likelihood display of the data provides a more visually informative understanding than standard plots of kb by −log 10 (P-value). Moreover, a pure likelihood analysis is particularly well suited for joint analysis of multistage designs,29 largely due to how pure likelihood analyses adjust for type I error inflation due to multiple hypothesis testing. Adjustments for multiple SNP tests are accomplished by following up signals from the first stage with additional samples analyzed in a joint analysis.29 This is in contrast to standard P-value analysis approaches that require adjustment of P-values, for example Bonferroni, FDR.30 For a discussion on pure likelihood multiple test adjustments, see Supplementary Information and reference 29.

Frequentist methods used

We calculated a Cochran–Armitage test for trend in the case–control sample. We used a transmission disequilibrium test as implemented in FBAT31 in the subset of trios to ensure that any signal we found through case–control analysis was not itself due to population stratification. For multilocus analysis, we used multiple logistic regression of main effects and two-way interactions, coding the genotypes as −1, 0 and 1, with interaction being the product of the genotypes.

Pure likelihood methods used

In a pure likelihood analysis we report observed likelihood ratios (LRs) as well as provide figures of likelihood intervals (LIs) for the odds ratio (OR), by base-pair position. For example, a 1/32 LI is defined as the set of OR values where the standardized likelihood function (divided by the likelihood evaluated at the maximum likelihood estimator) is greater than 1/32.18 LIs are analogous to confidence intervals in that they are comprised of all parameter values that are supported by the data. However, LIs do not require a long-run frequency interpretation, rather they reflect the evidence about the OR provided by the given data set. The pure likelihood analysis implemented assumes an additive disease model. We used profile likelihoods32 to construct the LRs and assess association at each SNP. We used LOD evidence of strength 1.5 as a criterion from the observed LRs to define an SNP of interest.

Results

CTS links to markers at 11p13

Only markers on chromosome 11 yielded two-point LOD scores exceeding 3.0. Markers in the region of chromosomal band 11p13 provided strong and compelling evidence for linkage to CTS. Marker D11S4102 yielded a two-point LOD score of 4.01, and seven other markers in the immediate region also exhibited LOD scores exceeding 2. Both European and non-European ancestry families contributed proportionally to the LOD score. The markers on chromosome 11 generally maximized at unequal male–female recombination fractions, because the male–female recombination map differs substantially in this region. For example, at D11S4102, the recombination rate for females is 1.70 cM/MB, whereas for males it is 0.48 cM/MB. Two-point LOD score maximization in this region of 11p most often occurred at 95% penetrance. Although single markers on chromosomes 5, 9, 10, 12 and 16 provided two-point LOD scores >2.0, the flanking marker information was not generally compelling. We did not observe significant evidence of linkage at markers previously reported for CTS at 15q1417 (D15S165 – maximum LOD score 0.1381), nor for a rare recessive variant of RE at 16p12–11.213 (D16S3068 – maximum LOD score 0.2959), nor for X-linked rolandic seizures and cognitive deficit (MIM 300643)14 (DXS8020 – maximum LOD score 0.39). Similarly, we did not find evidence of linkage to 11p13 in an autosomal dominant variant of RE with speech dyspraxia and cognitive impairment.15

Figure 1 shows the heterogeneity (‘HLOD’) and homogeneity (‘LOD’) linkage results observed in the multipoint analysis of chromosome 11, for a dominant mode of inheritance with 50% penetrance. This analysis model resulted in the highest multipoint LOD scores: 4.30 at marker D11S914 (7.4 cM from the two-point maximum). There was no suggestion of heterogeneity (α̂=1) in the region of linkage. The region bounded by LOD scores >2.0 spans from 43.17 to 56.88 cM, with D11S914 located at 46.7 cM,33 and includes the following annotated genes: DCDC5, DCDC1, DPH4, IMMP1L, ELP4 and PAX6.

Figure 1
figure 1

Multipoint LOD and heterogeneity LOD (HLOD) scores for CTS on chromosome 11: maximum HLOD=4.3 at D11S914 under a dominant mode of inheritance with 50% penetrance.

Association of CTS with SNPs in ELP4

We typed a total of 44 SNPs across the linkage region in 68 cases and 187 controls (discovery set). Here, we conducted a pure likelihood analysis as well as computing standard Cochran–Armitage trend test P-values for comparison. The pure likelihood analysis is particularly well suited to a joint analysis of discovery and replication samples,29 and has been noted to be particularly appropriate for genetic data.20, 34 The pure likelihood analysis plots OR on the y axis vs base-pair position on the x axis. Evidence for association at a given SNP is determined by calculating the LR; whether a calculated LR provides strong association evidence is interpreted by LOD score benchmarks: for example, a LOD>1.5 (equivalent to an LR>32) is interpreted as reasonably strong association evidence. We found no evidence of association with SNPs in DCDC5, DCDC1, DPH4, IMMP1L or PAX6 as indicated by Figure 2 and with gray LIs on Figure 3a. The two long gray lines extending off the plot indicate lack of information, mainly due to a low minor allele frequency. However, we did find significant evidence of association with SNPs in ELP4 with both the Cochran–Armitage trend test and the pure likelihood analysis. Most notably, the SNPs of interest identified by the likelihood analysis were rs964112 in intron 9 (P=0.0008, significant after Bonferroni correction), rs11031434 in intron 6 (P=0.003) and rs986527 (P=0.001) in intron 5 (see Figures 3a and b, Table 3 for summary statistics) with estimated ORs 1.80–2.04 at these markers. We ensured that all SNPs that had an r2>0.8 with rs964112 were genotyped, but none were identified as functionally significant. In the family-based P-value analysis using FBAT, only SNPs in ELP4 provided evidence of association, with the smallest P-values observed at rs986527 (P=0.06) and rs1232182 (P=0.04) with 27 and 28 informative families, respectively. These results argue against population stratification as a positive confounder for the observed ELP4 association.

Figure 3
figure 3

(a and b) Pure likelihood plot of association evidence in discovery set (a, top) and in joint analysis of data sets (b, bottom). This pure likelihood analysis plots odds ratio (OR) on the y axis and base-pair position on the x axis. Each vertical line represents a likelihood interval (LI) for the OR at a given SNP. The OR=1 line is plotted as a solid black horizontal line, for reference. LIs in color are denoted as SNPs of interest, whereas a gray line indicates that the SNP is not of interest because the 1/32 LI for that SNP covers the OR=1 line. The small horizontal tick on each LI is the maximum likelihood estimator for the OR. The portion of the colored LI that covers the OR=1 horizontal line indicates the strength of the association information at that SNP. In particular, if the navy blue portion is above the OR=1 line while the yellow portion of the LI covers the OR=1 line, then the LOD evidence at that SNP is between 1.5 and 2 (ie, the 1/32 LI does not include the OR=1 value, but the 1/100 LI does); similarly, if both the yellow and navy blue portions are above the OR=1 line but the turquoise portion covers the line, then the LOD evidence is between 2 and 3 (ie, the 1/100 LI does not include OR=1 as a plausible value but the 1/1000 LI does). The further the colored line is above the OR=1 line, the stronger the association evidence. The max LR for each SNP in color is also provided as text in the plot, providing evidence not only of whether the LOD evidence is between 2 and 3, but also the exact value of the max LR.

Table 3 Single-SNP association results: pure likelihood and frequentist analyses at SNPs of interest in ELP4; P-values are unadjusted

In the pure likelihood joint analysis of discovery and replication samples, the replication sample confirms that SNPs in ELP4 are highly associated with CTS (Figure 3b). Here the association evidence for all three SNPs of interest from the discovery set has increased after combination with the replication data set. The maximum LR at rs964112 is now 589.75 (formerly 156.95 in the discovery set), which is evidence equivalent to observing an LOD score of 2.77, and at rs986527 the maximum LR=628.85 (LOD equivalent of 2.80). The estimated ORs represent a twofold increase in risk of CTS. The ORs, 1/32 LIs, maximum LRs and trend test-unadjusted P-values from the discovery and joint analyses are displayed in Table 3. It should be noted that, when analyzed on its own in a standard P-value analysis, the replication sample provided strong evidence of association in ELP4, with rs2104246 significant after Bonferroni correction (unadjusted P=0.0006). We have reported analysis of combined ancestry data, although the results are qualitatively similar when restricted to European ancestry data. The substantial increase in maximum LR from joint analysis of the two data sets provides compelling evidence that the ELP4 variants, specifically rs986527 and rs964112, are indeed associated with CTS in RE families.

Multi-SNP analysis

We used multiple logistic regression for multi-SNP analysis.35 The SNPs of interest are in high LD with each other, which indicates that it is less likely we are detecting multiple independent variants in the region of ELP4. Multiple logistic regression analysis indicated that rs964112 was the best predictor of CTS, with no other SNP main effects or two-way interactions significant in the model; in the absence of rs964112, rs986527 played a similar predictive role. These SNPs were almost completely correlated.

Resequencing coding regions of ELP4

We resequenced the coding portions, exon–intron boundaries and 5′ upstream region of the ELP4 gene in 40 RE probands from the discovery set. The 274 kb ELP4 gene is transcribed into a 1584 bp mRNA consisting of 12 exons, a 35 bp 5′-UTR and a 257 bp 3′-UTR. Alternative transcripts have been reported that include or exclude the last two exons. Primers were designed for direct sequencing of each of these 12 exons including some adjacent intronic sequence, as well as the putative promoter region; a list of these primers is included in Supplementary Table 2. The same primers were used for PCR and sequencing reactions. After alignment, all homozygous and heterozygous variants within the sequenced region were noted.

Three previously reported SNP variants were found in these 40 individuals: rs2295748 in the vicinity of the promoter; rs2273943 within intron 5 located 127 bases upstream of exon 6; and rs10767903, located within exon 10. The genotypes and allele frequencies for these SNPs in these individuals were compared with those available through dbSNP. The minor allele for rs2295748 was slightly less common in the 40 RE cases (0.22) than in any of the AFD or CEPH populations, whereas the minor allele for rs2273943 occurred in these cases at approximately the same frequency (0.24) as in the Caucasian and Chinese CEPH populations. Frequency information was not available for comparison for rs10767903 so we typed 85 controls at this SNP. The T allele at rs10767903 is predicted to abolish an adjacent splice donor enhancer site that would result in skipping of alternative exons 10 and 11. Out of the 36 RE probands that we were able to type at this synonymous polymorphism, 34 carried the T allele (21 TT, 13 CT, 2 CC), whereas controls exhibited a similar genotypic distribution: 42 TT, 34 CT, 9 CC.

Discussion

Taken together, our results suggest that ELP4 is associated with the pathogenesis of RE and has a strong effect on risk for CTS in RE families. This locus appears to be distinct from those discovered in rare Mendelian RE variants. The precise mutation that is presumably in linkage disequilibrium with the associated SNPs in ELP4 remains to be determined. However, the data presented here suggest that the presumed mutation lies either in the non-coding regions of ELP4, or else possibly just beyond the gene. This finding represents the first susceptibility gene identified for a common idiopathic focal epilepsy and the first step in unlocking the complex genetics of RE and related childhood epilepsies. It is also the first reported disease association with ELP4 in humans and offers possible insights into the etiology and kinship of associated developmental cognitive and behavioral disorders.

There are several reasons why these results are unlikely to be spurious. The localization of ELP4 was conducted through genome-wide linkage analysis: only one area of the genome at 11p13 showed strong and compelling evidence for linkage to CTS. Under that linkage peak, fine mapping evidence unambiguously pointed to the association of CTS with SNP markers in ELP4. SNPs in ELP4 were associated with increased risk of CTS in both discovery and replication data sets, with evidence for association of the same SNPs in each data set. Furthermore, not only the same SNPs but also the same alleles were associated with increased risk of CTS in both data sets. Interestingly, we found no evidence of locus or allelic heterogeneity based on ancestry in either linkage or association analyses. In addition, the association in the discovery set was consistent using FBAT, mitigating concerns about positive confounding due to population stratification. Thus, linkage and replicated association data offer compelling and consistent evidence for the role of ELP4 in susceptibility to CTS. These results await independent replication.

The mapping of CTS to ELP4 suggests that the common form of RE and rare variants of RE are genetically heterogeneous. Our data revealed little or no evidence of linkage to recessive (MIM 608105)13 or X-linked (MIM 300643)14 variants of RE, neither did a rare autosomal dominant form of RE with speech dyspraxia and cognitive impairment show linkage to 11p13.15 Thus, it seems that loci in Mendelian variants of RE may represent ‘private’ mutations. An earlier candidate gene study of CTS ascertained through northern European pedigrees multiplex for RE and ABPE did not test for linkage to chromosome 11.17 Instead, linkage was reported to the EJM2 locus (MIM 604827), which may reflect genetic (locus) heterogeneity, or alternatively could be explained by shared susceptibility to myoclonic seizures, which feature in both juvenile myoclonic epilepsy and ABPE syndromes. There were no patients with myoclonic seizures in our samples.

This is the first reported disease association of ELP4. ELP4 is one of the six subunits (ELP1–ELP6) of Elongator36 that has both nuclear and cytoplasmic localization and two distinct but incompletely characterized roles in eukaryotic cells:37 in transcription38 and in tRNA modification.39 Elongator plays a key role in transcription of several genes that regulate the actin cytoskeleton, cell motility and migration.40 These functions are crucial in the nervous system for nerve cell growth cone motility, axon outgrowth and guidance, neuritogenesis and neuronal migration during development. Intriguingly, another Elongator subunit mutation has been implicated in human neurological disease. Riley–Day syndrome (MIM 223900) is an autosomal recessive, sensory and autonomic neuropathy, with EEG abnormalities and epilepsy.41 Riley–Day syndrome is caused by mutation in a splice site of the hELP1 (or IKAP) gene, which causes tissue-specific exon skipping and expression of a truncated mRNA transcript,42, 43 with highest ratios of mutant transcripts in the brain.44 hELP1 mutations abrogate Elongator function, not just hELP1 expression,40 because Elongator function is dependent on the integrity of all its subunits. Cells in Riley–Day patients have reduced motility, which can be rescued by wild-type hELP1.40 ELP4 mutations might also partially abrogate Elongator function in the central nervous system through its effect on multiple cell motility and actin cytoskeleton genes and/or proteins during development. Such a mechanism could plausibly explain the breadth of subtle developmental disorders that are associated with the CTS trait,10 ranging from speech disorder7 and DCD9 to attention problems and ADHD.4, 8 Proof of this hypothesis will require the genetic investigation of large cohorts of carefully phenotyped individuals and detailed functional characterization of ELP4.

Although we found association with SNPs across ELP4, regression analysis indicated that spread of association evidence could be explained by linkage disequilibrium around rs986527 in intron 9, LD that stretches to IMMP1L and the 3′-end of ELP4, but not to PAX6. Subsequent resequencing of the coding, boundary and promoter regions revealed no enrichment of ELP4 exonic polymorphisms among probands. Exclusion of the coding sequences suggests that the genetic effector may lie in the non-coding regions of ELP4. It is less likely that the causative mutation lies in a distant gene beyond IMMP1L upstream or PAX6 downstream there is a drop-off in linkage disequilibrium at subjacent markers. Interestingly, the non-coding regions between ELP4 exon 9 and exon 12 are large (over 130 kb), and contain long range, tissue-specific, cis-regulatory elements for PAX6.45 PAX6 remains a candidate gene of interest because of its highly conserved, developmental regulatory role in the formation of the telencephalon.45 Intron 9 of the canonical splice variant contains a recently inserted pseudogene that is included in the ELP4 mRNA to produce a splice variant, with alternative exons 10 and 11, that is only found in higher primates. The alternative exon 10 contains multiple sequences that are consensus-binding sites for splicing enhancer-binding proteins. Although there are multiple transcripts of ELP4 that contain alternatively spliced exons and encode different sized proteins, the functional and evolutionary significance of most is presently unclear. Expression analyses and resequencing of the non-coding regions may help to reveal the molecular mechanism of seizure susceptibility at this locus.

Substantiating ELP4 as a risk locus for CTS is the first step in assembling the complex genetic model of RE. Additional genetic factors though, may need to be invoked to explain the occurrence of seizures and reading disability in RE. For example, although CTS is common in children,6 only an estimated 10% of children with the trait manifest clinical seizures.11 At the same time, there is no evidence for an environmental contribution to RE. Thus, although CTS is mandatory for the definition of RE, additional genetic factors, which likely act in combination with the ELP4 locus to cause the classic focal seizures of RE, remain to be elucidated. In summary, we report strong, replicated association between ELP4 variants and the CTS trait. We hypothesize that an as-yet unidentified non-coding mutation exists that is in linkage disequilibrium with SNPs in ELP4 intron 9. This hypothesized mutation impairs brain-specific Elongator function during brain development, possibly mediated through interaction with genes and proteins in cell migration and actin cytoskeleton pathways.