Introduction

Cerebellar ataxia, mental retardation and dysequilibrium syndrome (CAMRQ) is a rare and genetically heterogeneous autosomal recessive disorder characterized by mental retardation, cerebellar ataxia and dysarthric speech with or without quadrupedal gait.1, 2, 3, 4, 5, 6, 7, 8 Multiple consanguineous families have been reported with autosomal recessive inheritance of the condition. The first locus was mapped to a 7.1-Mb region on chromosome 17p13 and a missense mutation was reported on WDR81 (WD repeat domain 81; CAMRQ2; MIM: 610185; also referred to as Uner Tan syndrome).1, 2, 7 Linkage mapping followed by candidate gene sequencing also led to the identification of mutations in very low-density lipoprotein receptor (CAMRQ1; MIM: 224050)3, 4, 5 and carbonic anhydrase VIII (CAMRQ3; MIM: 613227).6

In another consanguineous family (Family C)3, 9 from Turkey, the involvement of VLDLR, WDR81 and CA8 genes were excluded, and four shared-homozygous regions on chromosomes 13, 19 and 20 were uncovered by homozygosity mapping. To identify the culprit gene, we utilized targeted next-generation sequencing of all homozygous regions and evaluated all co-segregated variants using functional and structural predictions and population screening. We report herein that a recessive missense mutation in ATP8A2, encoding ATPase, aminophospholipid transporter, class I, type 8A, member 2, is associated with the phenotype in Family C. In an independent study, a de novo t(10;13) balanced translocation disrupting the coding sequence of ATP8A2 on 13q12 was observed in a patient with severe mental retardation and major hypotonia, raising the possibility that haploinsufficiency of this gene could be implicated in neurodevelopmental phenotypes.10 On the basis of these observations, we suggest that ATP8A2 could be critically important in the development of the nervous system.

Subjects and methods

Patients

The consanguineous family analyzed in this study has four members affected by mental retardation, mild cerebellar and cerebral atrophy and truncal ataxia (Figure 1). The index case was a 27-year-old man exhibiting total inability to walk (05-993). Briefly, patients share the following clinical features: truncal ataxia with/without quadrupedal gait, mental retardation and dysarthric speech. MRI results revealed mild atrophy of cerebral cortex, corpus callosum and inferior cerebellum. Clinical description of Family C was published elsewhere.3, 9 The only affected female in the family could not be included in the study, as her parents did not give consent for DNA analysis. Case 05-993 recently died secondary to a respiratory infection. The study was approved by the institutional review boards at the Baskent and Cukurova Universities (decision KA07/47, 02.04.2007 and 21/3, 08.11.2005, respectively). Written informed consent was obtained from all participants or their parents before the study.

Figure 1
figure 1

Pedigree of Family C with haplotype structure of the disease interval on chromosome 13q12. Haplotype segregating with the disease is boxed. ATP8A2 c.1128 C>G mutation is bold. Please note that the DNA of one affected individual is not available for the study.

Homozygosity mapping analysis

Participants’ DNA from peripheral blood samples were genotyped using 10 K Affymetrix SNP chips. Experiments were performed according to the manufacturer’s instructions (Affymetrix, Santa Clara, CA, USA). DNA of two affected individuals (05-994 and 05-996) was genotyped using Illumina Human610-Quad BeadChip according to manufacturer’s recommendations (Illumina, Inc., San Diego, CA, USA). The image data were normalized and the genotypes were called using data analysis software (Bead Studio, Illumina). Homozygosity mapping analysis was performed using HomozygosityMapper software.11 Homozygosity was ruled out for the previously reported loci. Markers D13S787, D13S1243, D13S742, D13S283, D13S1294 and D13S221 were used to test homozygosity for the most likely candidate locus, chromosome 13q12. Haplotype analysis was carried out by hand.

Mutation analysis

A total of 16 711 445 base long unique probes were designed to target homozygous regions (Supplementary Table 1) using a custom-designed Nimblegen Human Sequence Capture HD2 microarray (Roche NimbleGen, Madison, WI, USA). DNA sample from an affected individual (05-996) was captured using 3 μg input DNA. Captured DNA sample was sequenced with Illumina Genome Analyzer IIx. Illumina sequence data were mapped to reference genome (hg18) using Maq12 and single-nucleotide variants were determined with Samtools.13 To determine indels, data were mapped with BWA14 and analyzed with Samtools. Variant coordinates were converted to hg19 before publication by liftOver tool (http://genome.ucsc.edu/cgi-bin/hgLiftOver). Coverage calculations of coding regions were done with mpileup module of Samtools13 and intersectBED command of BEDTools.15 Novel variants were determined based on SNPs reported in dbSNP database and further analyzed in 1000 genome data sets (http://www.1000genomes.org), NHLBI Exome Sequencing Project (http://evs.gs.washington.edu/EVS/, data release ESP5400) and exome sequencing data of 2400 individuals with non-neurological disorders generated at Yale University. Common variants were excluded if minor allele frequency was lower than 0.1%. Novel variants were confirmed by Sanger sequencing. Segregation analysis of the variants in the pedigree and its presence in healthy population were carried out using allele-specific PCR analysis (Supplementary Table 2). Racial distribution of control group was 100% Caucasian, including 22% from southeastern Turkey.

Bioinformatics analysis

DNA and protein sequences were obtained from ENSEMBL database.16 SIFT,17 PolyPhen218 and MutationTaster19 tools were used to predict causative variants. Genomic evolutionary rate profiling (GERP) and phylogenetic P-value (phyloP) conservation scores for each variant were extracted seperately from the UCSC Genome Browser allHg19RS_BW track20 and phyloP46wayall track,21 respectively. Functional and transmembrane domains of the ATP8A2 protein were predicted using Pfam database22 and TmPred prediction tool (www.ch.embnet.org/software/TMPRED_form.html), respectively. Homology searches were performed with CLCMain Workbench (CLC Bio, Aarhus, Denmark) using appropriate modules. CLCMain Workbench also generates phylogenetic tree using UPGMA algorithm that is evaluated by bootstrap analysis. Possible effects of the variant on protein secondary structure were predicted using PSIPRED server.23 Published microarray data sets of E9.5, E11.5 and E13.5 mouse brain tissue (GSE8091)24 were obtained from the GEO database (http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi) and analyzed with GeneSpring GX V11.1 software (Agilent Technologies, Santa Clara, CA, USA). Differentially expressed genes within day groups were filtered (one-way ANOVA test Bonferroni-corrected P<0.001) and genes that correlated with Atp8a2 (R=0.95–1.0) were functionally annotated using DAVID tools.25 Primers used in this study were designed with Primer326 software and are listed in Supplementary Table 2.

Quantitative real-time RT-PCR

First-strand cDNAs were prepared from human RNA samples (Clontech, Mountain View, CA, USA: 636567 (corpus callosum); Agilent: 540007 (cerebellum), 540117 (frontal cortex), 540137 (occipital cortex), 540157 (fetal brain), 540053 (brain stem), 540005 (total brain), 540143 (parietal cortex), 540135 (striatum)) using RevertAid First Strand cDNA Synthesis kit with random hexamer primers (Fermentas, now Thermo Fisher Scientific, Waltham, MA, USA; K1622) after DNaseI (Fermentas EN0521) digestion. Real-time RT-PCR was performed using IQ SYBR Green Supermix according to standard protocols (BioRad, Hercules, CA, USA; 170-8882). Ct values were normalized to GAPDH as an internal control. The data were analyzed using the Pfaffl method.27

Results

We identified four common homozygous regions in two affected individuals (05-994 and 05-996) using Ilumina Human610-Quad BeadChip. Targeted next-generation sequencing of all homozygous regions (Supplementary Figure 1 and Supplementary Table 1) was carried out using DNA of one affected individual (05-996). This region was enriched 629-fold in the capture experiment. In total, 48.62 million single-end 75 bp reads were obtained and 29.2% of the reads mapped to the targeted regions. This in turn provided a mean coverage depth of 62.96-fold across the targeted homozygosity intervals with 97.41% of the targeted bases being covered by at least four reads (Supplementary Table 3). Next, the constitutive exons in the homozygous intervals were analyzed and 99.51% of the protein coding regions was found to be covered by at least four reads. When the genes encoding for the constitutive exons in the low- or zero-coverage regions were analyzed, they either do not have cerebellar expression or do not display a phenotype compatible with cerebellar involvement in mouse knockouts (Supplementary Table 4). On the basis of these results, we find it highly unlikely that a causative mutation is missed.

A total of 14 103 homozygous variants (13 394 single-nucleotide variants and 709 indels) were detected by next-generation sequencing. Of these, 13 528 variants were reported by dbSNP132. Remaining 575 novel variants were classified by genomic context: protein altering or flanking splice junctions (n=11), coding synonymous (n=4), 5′-UTR (n=44), 3′-UTR (n=30), intronic (n=224) and intergenic (n=262). Of the 11 protein-altering variants, four were excluded based on the comparison for novelty with 1000 genomes data, NHLBI Exome Sequencing Project and the exome sequence data of 2400 individuals with non-neurological diseases. The remaining seven variants in the coding regions of homozygous blocks were verified by Sanger sequencing and four of them were excluded by segregation analysis (Supplementary Figures 2–3). Two missense variants (ATP8A2 p.I376M and APBA3 p.A97T) and a 3-bp in-frame deletion (PCP2 p.E6del) were consistent with the recessive inheritance of the disease allele in Family C (Table 1, Figure 1 and Supplementary Figure 2).

Table 1 Novel coding variants identified by targeted next-generation sequencing of 05-996

APBA3 p.A97T variant was excluded based on the conservation considerations and prediction analyses. Four of 20 species sequenced have threonine (T) at the orthologous site (Supplementary Figure 4), suggesting that this variant would be a polymorphism and not damaging to humans. A negative GERP score (−4.11) for the mutated nucleotide suggests that this site is probably evolving neutrally.20 PhyloP score of the variant (−0.308) suggests a faster evolution than expected for this site.21 Furthermore, the variant was predicted as ‘tolerated’ by SIFT17 (SIFT score, 0.16), ‘benign’ by PolyPhen218 (PSIC score difference, 0.0) and ‘polymorphism’ by MutationTaster19 (P-value, 0.999) (Table 1).

PCP2 p.E6del was excluded based on population screening. In 360 healthy chromosomes, four heterozygous individuals were identified (Supplementary Figure 5), yielding an expected homozygote frequency of approximately 1 in 8000. The region containing the mutation is not conserved among species, and the deletion was predicted as ‘polymorphism’ by MutationTaster19 (P-value, 0.717; Table 1 and Supplementary Figure 6).

The remaining variant at chr13:26128001 (hg19; c.1128 C>G) is located in exon 12 of ATP8A2 (ENSG00000132932, ENST00000381655) and results in an isoleucine (I) to methionine (M) substitution at residue 376. The mutation co-segregated with the disease in Family C (Figure 1) lies in the C-terminal-predicted transmembrane site of the E1 E2 ATPase domain (Figure 2a) and is highly conserved across species (Figure 2b and Supplementary Figure 7). Screening of 1210 control chromosomes, including 300 individuals from the same geographic region as Family C, excluded presence of the variant in this control population. SIFT,17 PolyPhen218 and MutationTaster19 tools predicted the ATP8A2 p.I376M as a causative mutation (scores: 0.0, 1.0 and 0.955, respectively). Consequences of the amino acid change in protein structure were evaluated by comparing the predicted secondary structures of wild-type and mutant protein sequences. The wild-type protein is predicted to contain 27 β-strands and 32 α-helices. I376 residue is located at the N terminus of the 11th α-helix. The mutation enlarges the 11th and 12th α-helices and creates an additional α-helix at residue 401 (Figure 2c).

Figure 2
figure 2

Graphical representation of the predicted functional and structural elements of ATP8A2 protein. (a) ATP8A2 is composed of an E1 E2 ATPase domain and a haloacid dehalogenase-like hydrolase (HAD) domain. Ten transmembrane domains were predicted by TMPRED. The mutation lies in the transmembrane region of C-terminal end of E1 E2 ATPase domain (dot). (b) Multiple amino acid sequence alignments show the sequence homology of ATP8A2 protein in vertebrates. I376 residue is indicated with a box. (c) Graphical representation of secondary structural elements as predicted by PSIPRED. The predicted elements (Pred) are indicated above the amino acid (AA) sequences (straight lines: coils; cylinders: helices; arrows: strands). The mutation is predicted to alter the secondary structure of the protein. Transmembrane region is represented within the Pred graphs of wild-type (WT) and mutant (Mut) proteins. EC, extracellular; IC, intracellular.

The status of ATP8A2 was evaluated in a cohort of 750 patients with structural cortical malformations or degenerative neurological disorders, and the underlying genetic cause is still unknown. Whole-genome genotyping data generated by Illumina Human 370 Duo or 610K Quad BeadChips is available for this cohort. None of the patients were found to harbor a homozygous interval (≥2.5 cM) surrounding the ATP8A2 locus. Exome sequencing of the same group did not reveal any mutations, including compound heterozygous substitutions, in ATP8A2.

The transmembrane protein, ATP8A2, consists of four protein-coding isoforms. The longest isoform (ENST00000381655) contains 37 exons and encodes a 112 kDa protein. The protein is highly expressed in newborn and embryonic tissues, with strongest expression in mouse heart, brain and testis.10, 28 RT-PCR analysis revealed similar expression in different regions of the human brain.10 To evaluate the possible involvement of ATP8A2 in motor functions, we examined its expression profile in different human brain regions by quantitative real-time RT-PCR. Human ATP8A2 is expressed in all brain regions with the highest level of expression in cerebellum (Figure 3). ATP8A2 expression in the patients cannot be evaluated, as the gene is not expressed in lymphocytes.

Figure 3
figure 3

Expression pattern of ATP8A2 in nine different regions of human brain. Real-time RT-PCR analysis showed that ATP8A2 is expressed in all regions of the brain with the highest levels in the cerebellum.

To further investigate the role of ATP8A2 in brain development, we examined the expression profiles of early embryonic mouse brain (GSE8091)24 and identified genes with significantly correlated expression profiles (R>0.95, n=218) with that of ATP8A2. Functional clustering analysis suggested that positively correlated genes were enriched for those involved in neuron differentiation, cell, and neuron projection morphogenesis and axonogenesis (Bonferroni-corrected P-values: 2.1E-3, 2.7E-3, 4.5E-3 and 1.5E-2 respectively). ATP8A2 is co-expressed with doublecortin responsible for lissencephaly and WDR81 associated with CAMRQ2,7 suggesting that these genes could represent similar developmental pathways.

Discussion

CAMRQ is a rare genetically heterogeneous cerebellar ataxia with mental retardation and dysarthric speech, with or without quadrupedal gait. Since the first mapping of the gene locus on chromosome 17p13, two additional loci on chromosomes 9p24 and 8q12 have been reported, and causative mutations have been identified in VLDLR, CA8 and WDR81.2, 3, 6, 7 Here we present the identification of a fourth gene locus in a consanguineous family of two affected siblings and an affected nephew. Using whole-genome homozygosity mapping followed by targeted next-generation sequencing, several missense variants were observed. Filtering the variants by co-segregation analysis, population screening, protein conservation and disease gene prediction approaches revealed a novel missense variant in ATP8A2 (c.1128 C>G; p.I376M) that segregates with the phenotype. The mutation is located inside a transmembrane domain and is predicted to change secondary structure of the protein.

ATP8A2 belongs to the P4-ATPases subfamily of P-type ATPases, which are involved in the transport of aminophospholipids. Biochemical studies have shown that P4-ATPases determine the curvature of the phospholipid bilayer by flipping aminophospholipids from the exoplasmic to the cytoplasmic leaflet.29, 30 ATPases have been implicated in human diseases such as ATP10C in Angelman syndrome,31 ATP8B1 in hearing loss32 and hereditary cholestasis,33 and ATP8A2 in a severe neurological phenotype.10

ATP8A2 is involved in the transport of aminophospholipids toward the cytoplasmic leaflet in brain cells, retinal photoreceptors and testis.34 In humans, ATP8A2 is mainly expressed in brain tissues, with highest levels in cerebellum, as well as in retina and testis.10 Cerebellum is a crucial regulatory organ for motor coordination and this expression pattern is consistent with CAMRQ. The fact that CAMRQ-associated genes have retinal expression34, 35 raises the possibility that eye abnormalities may be an additional clinical feature of the phenotype. Strabismus has been observed in almost all affected individuals in all the families reported thus far.1, 2, 3, 4, 5, 6, 7, 8 In addition, homozygous WDR81 mutation carriers display downbeat nystagmus, temporal disk pallor and macular atrophy.36 However, retinopathy is not a feature of WDR81-, VLDLR- and CA8-associated CAMRQ.6, 36 With respect to ATP8A2, further information is not available, as Family C declined neuro-ophthalmological investigations.

Documentation of a de-novo-balanced translocation leading to ATP8A2 haploinsufficiency10 brings into attention the clinical findings of carriers in Family C. Whereas 05-992 and 05-995 did not show neurological abnormalities, the t(10;13) de-novo-balanced translocation carrier presented with a severe neurological phenotype that partially overlaps with the phenotype of the affected members of Family C. The possibility of a chimeric protein was ruled out, leaving haploinsufficiency of ATP8A2 as the most likely explanation for the phenotype. This suggests that ATP8A2 mutations represent yet another example of clinical heterogeneity in the context of genomic understanding of complex traits in humans and demonstrates fundamental features of genomic analysis of human traits such as variable expression, allelic heterogeneity and genotype–phenotype correlations. Other examples include CRYBB1 in congenital cataract,37 COLL11A2 in Zweymuller Weissenbacher syndrome38 and MYBPC1 in arthrogryposis.39

These findings suggest that ATP8A2 could be critical for the developmental processes of central nervous system, and alterations of this gene may lead to severe neurological phenotypes.