Introduction

Although for a diagnosis of an autism spectrum disorder (ASD) symptoms should be present from infancy or early childhood, the disorder may not be detected until later because of several reasons: a well-structured support system, compensation for limitations through high intelligence, the presence of more subtle autistic symptoms, and confusion with or overshadowing by another psychiatric disorder (Kan et al. 2008; Wing and Potter 2002, see also www.dsm5.org). Partly due to increasing knowledge of milder forms of autism and more awareness that autistic conditions can be found in individuals of high ability, ASDs are starting to become more widely recognized in adults (Brugha et al. 2009; Fombonne 2005; Kan et al. 2008; Wing and Potter 2002). In clinical practice, we notice a growing demand for diagnostic procedures concerning ASD in adults. However, there is no established diagnostic tradition for ASD in older individuals. It is very challenging to disentangle social and communicative problems associated with ASD from the often complicated clinical picture in adulthood, especially when developmental information is unavailable. Standardized instruments are needed that can facilitate the diagnostic process. Poor self-referential cognition present in many individuals with ASD may hamper self-report measures of autistic symptoms (Johnson et al. 2009; Lombardo et al. 2007). Therefore, observation of the individual during social interaction is important in addition to information about difficulties experienced in daily life.

The Autism Diagnostic Observation Schedule (ADOS, Lord et al. 2000) is a standardized instrument that assesses social interaction, communication, and imagination during a semi-structured interaction with an examiner. The ADOS includes four modules suited for individuals with different developmental and language levels, ranging from children with no expressive language to older and verbally more capable individuals. The psychometric properties of modules 1–3 are well-studied and present the ADOS as a reliable and valid instrument to assess the presence of ASD in children (de Bildt et al. 2004; Gray et al. 2008; Lord et al. 2000; Noterdaeme et al. 2000; Papanikolaou et al. 2009). Module 4 was developed for adolescents and adults with fluent speech. In the original paper on the ADOS, Lord et al. (2000) included module 4 administrations for adolescents and young adults with autism (AD, n = 16), PDD-NOS (n = 16) and with various other diagnoses (n = 15). Their results indicate that, after training as described in the manual (Lord et al. 1999), ADOS module 4 can be used effectively to distinguish between autism spectrum and non-spectrum, and to a lesser degree between AD and PDD-NOS. Thus far, no further specific studies into the value of module 4 have been reported. When establishing a diagnosis, clinicians need to rule out specific conditions that can cause similar symptoms. Because the control group in Lord’s study (2000) was relatively small and very diverse with respect to diagnosis, it is still unclear to what extent ADOS module 4 can support such differential diagnostics.

One disorder that shares symptoms with autism is schizophrenia. Kanner (1943) even borrowed the term autism from Eugen Bleuler, who used it to describe withdrawal from contact with the outside world in adults with schizophrenia (1911). Although autism and schizophrenia have different developmental trajectories, cross-sectionally their clinical presentations overlap (Frith and Happé 2005; Goldstein et al. 2002; Volkmar and Cohen 1991). Especially individuals with schizophrenia and negative symptoms show many of the same social deficits as adults with autism (Frith and Happé 2005). Autism also shares features with psychopathy, a personality disorder which partly overlaps with antisocial personality disorder (APD). Besides poor behavioral control and a disregard for the rights of other people, individuals with psychopathy have deficits in the emotional and interpersonal domain, such as insensitivity or lack of empathy towards other people. Impairments in empathy are also central to ASD, characterized by a cognitive impairment to take the perspective of other people (Baron-Cohen and Wheelwright 2004; Gillberg 1992). Rogers et al. (2006) indicate that there could be a subgroup of people with ASD that have additional callous-unemotional traits reminiscent of psychopathy. Others report that some individuals with ASD may seem cold and heartless, because they are unaware of how their behavior affects other people, which could lead to a diagnosis of APD or psychopathy by mistake (Bartels and Bruinsma 2008; Howlin 2000; Kohn et al. 1998). Especially in forensic settings, it is important to differentiate ASD from psychopathy, because they require different approaches. It should be noted, however, that unlike in psychopathy there is little evidence of any excess of crimes among people with autism (Howlin 2000).

The current study will examine the psychometric properties of ADOS module 4 by including relatively homogeneous non-autistic groups: a group of adult males with schizophrenia and marked negative symptoms, males with psychopathy, and typically developing males. Analyses will center on the original ADOS algorithm (Lord et al. 2000), based on the operationalization of the DSM-IV and ICD-10 criteria for autistic disorder (American Psychiatric Association 1994; World Health Organization 1993), but will also include some preliminary analyses based on revised algorithms for the ADOS. In line with proposals for the revision of the DSM (www.dsm5.org), the revised algorithms of the ADOS for modules 1-3 synthesize the items from the original social interaction and communication domains into the new domain Social Affect (SA, Gotham et al. 2007). This new notion of communicative and social behaviors as a single set of symptoms is supported by recent studies showing that non-verbal communication and social items load onto the same factor (Constantino et al. 2004; Lord et al. 1999; Robertson et al. 1999; van Lang et al. 2006). In addition, the revised algorithms include restricted and repetitive behaviors (RRB) as opposed to the original algorithm. Although the narrow time frame of the ADOS might not provide adequate opportunity to measure these behaviors (Lord et al. 2000), they seem to make an independent contribution to diagnostic stability (de Bildt et al. 2009; Lord et al. 2006). While adults with ASD may have a slightly different behavioral phenotype compared to children (Gotham et al. 2007), the core difficulties persist in adulthood (Seltzer et al. 2004; Shattuck et al. 2007). Therefore, it is of interest to explore the utility of this promising new metric in our adult population.

Methods

Participants

Thirty-two adult males with an ASD were recruited via local mental health organizations (mainly through the specialized Autism Team North Netherlands of Lentis, Groningen, the Netherlands), and through mailing lists for high-functioning individuals with ASD. Six individuals with ASD were recruited from a local forensic clinic (FPC Dr. S. van Mesdag, Groningen, the Netherlands). The participants were considered to be high-functioning by their clinicians and none had an IQ score below 70. All participants were diagnosed with an ASD by a clinical psychologist or psychiatrist according to DSM-IV-TR criteria (n = 8 AD, n = 17 AS, n = 13 PDD-NOS), based on review of developmental history, current daily functioning, and observation. For this study, the ASD group will be investigated as one diagnostic entity along a continuous dimension of severity for two reasons. First, it is proposed for the near future that distinctions will no longer be made among different types of autism in clinical practice, because they have proven to be “inconsistent over time and place, and to be associated more with severity, language level, and intelligence than specific features” (www.dsm5.org). Individuals with autism and PDD-NOS have also shown qualitatively similar behavioral patterns on the ADOS with varying degrees of severity (Lord et al. 2000). Second, investigating the subtypes would lead to overly small subgroups.

Eighteen adult males with schizophrenia and predominantly negative symptomatology, mainly outpatients, were selected by a specialized local mental health organization (Psychosencluster, GGZ Drenthe, Assen, the Netherlands). Diagnosis was confirmed by a structured clinical interview, the Dutch version of the Schedules of Clinical Assessment in Neuropsychiatry developed by the WHO (SCAN 2.1, Giel and Nienhuis 1996). Current symptomatology was assessed by the Positive and Negative Syndrome Scale (PANNS, Kay et al. 1987).

The psychopathy group consisted of 16 males recruited from two forensic psychiatric clinics (FPC Dr. S. van Mesdag and FPC Veldzicht). As part of the standard clinical procedure, these individuals were assessed with the Psychopathy Checklist Revised (PCL-R), an instrument widely used for the diagnosis of psychopathy (e.g. Hare 1991). Two diagnosticians obtained consensus on this instrument after separately scoring the items using file information extended with, if necessary, a semi-structured interview.

The typically developing group consisted of 21 typically developing males, who were interviewed to verify that first-degree relatives did not have an ASD or a history of psychosis. Age and IQ was matched with the participants with ASD who also took part in the neuroimaging part of the study (n = 21). There are no significant differences between the groups in terms of age and IQ. For an overview of the group characteristics see Table 1.

Table 1 Group characteristics

Measures and Procedure

Administration of the ADOS was part of the standard procedure of two large neuroimaging studies into the neural basis of empathy conducted in the Social Brain Laboratory (www.bcn-nic.nl/socialbrain.html). All participants gave written informed consent. The studies were approved by the Institutional Review Board of the University Medical Center Groningen (METc). The administration of ADOS module 4 included all standard activities and the optional daily living items to obtain relevant background information. The interviews were administered and scored by trained and certified psychologists. In total, five raters participated in the project including two certified ADOS trainers (AdB, SH). To ensure that agreement between raters remained at the high level requested by the ADOS, we discussed (fragments of) videotapes in two-monthly group meetings. The interviews were scored for consensus from videotape in changing pairs of raters, but included the examiner in the far majority of cases. In contrast to the second rater, the examiner was not blind to clinical diagnosis. The consensus scores were determined on the basis of the video-recording through a discussion in which the judgment of each rater was weighted equally. We only made an exception to this procedure when there was major disagreement (0 vs. 2) for the items B1 (Eye Contact) and B2 (Facial Expressions). Then, we gave priority to the examiner’s opinion, because we anticipated that these items might be difficult to judge from videotape alone. Fortunately, due to the high quality of the video-recordings, there was major disagreement in only two out of 93 administrations for eye contact, while for facial expressions such disagreement never occurred. Therefore, it is unlikely that the examiner’s previous knowledge influenced the consensus scores.

Design and Analysis

Algorithms

In this paper, we will use the terms “original algorithm” when referring to the standard algorithm (Lord et al. 2000) and “revised algorithm” when referring to the application of the revised algorithm based on Gotham et al. (2007). To reach a classification of AD or ASD on the original algorithm of the ADOS, an individual needs to meet thresholds for the communication domain (COM), the social interaction domain (SOC), and for the summation of these two domains (COMSOC), but not for the restricted and repetitive behaviors domain (RRB, Lord et al. 1999). For the revised algorithm, classification is based on solely thresholding the SARRB domain, which combines social, communication, and restricted behavior items. Since algorithm items across modules 3 and 4 are comparable and our sample size does not permit independent factor analyses in order to establish specific algorithm items, we applied the revised algorithm for module 3 to our group of high-functioning adults to calculate domain scores and a total score. In line with the explanation on the original algorithm, scores of 3 were converted to 2, and all scores other than 0–3 were treated as 0.

Interrater Agreement

Interrater agreement was assessed on the original algorithm at the level of ADOS classification, domains, and items. Agreement between raters at the level of diagnostic classification (AD, ASD, nonspectrum) was calculated through Cohen’s weighted kappa in addition to the percentage of agreement. Cohen’s kappa takes into account the agreement that can occur by chance between two raters and is therefore more stringent than the mere calculation of the percentage of times the raters’ scores lead to the same ADOS classification. Interrater agreement on the domains and the total score was calculated by means of intraclass correlations (ICC). ICC scores represent correlations across pairs of raters and are higher the more consistent the scores across two different raters are. ICC scores, internal consistency and correlations could not be reliably calculated for the RRB domain, because variance was too limited: for four out of the five items less than five subjects scored different from zero. To assess interrater agreement for separate items, we used mean linearly weighted Cohen’s kappa’s in line with Lord et al. (2000). Cohen’s linearly weighted kappa takes into account the agreement between two raters occuring by chance and considers the difference between a score of zero and one to be smaller than a difference between zero and two. Item B3 was ignored because its score depends on items A9, B1 and B2. In addition, only items were included for which more than five subjects scored different from zero (excluding nine items: A1, A3, A5, D1, D2, D3, D5, E1, E2).

Internal Consistency

To measure the internal consistency of the original and the revised domains, we applied Cronbach’s alpha. This statistic increases as the intercorrelations among test items within a domain increase.

Comparison of Domain Means

We used an ANOVA for each scale of both algorithms with fixed factor group, followed up by Tukey’s HSD post-hoc comparisons. We performed one-tailed Mann–Whitney tests to examine whether the forensic ASD group scored higher than the psychopathy group. To compare group differences at item level, we performed a MANOVA with fixed factor group on all items except the previously mentioned nine items that had limited variance and item B3. Post-hoc tests were performed for those items that showed a significant group effect.

Criterion-Related Validity

Here, criterion-related validity refers to the degree to which the outcome on the ADOS instrument is in agreement with the clinical diagnosis of having ASD or not. We used logistic regression to measure the success of both algorithms in predicting whether a participant received a diagnosis of ASD in clinical practice. Because ADOS classification is based on COM and SOC for the original algorithm and on the combined SARRB domain for the revised algorithm, we used these domains as predictors in two separate analyses. Logistic regression provides information on the sensitivity and specificity for the fixed cut-off point used in clinical practice. Receiver Operating Characteristic (ROC) curves provide information on the sensitivity and specificity of all other possible scores. In addition, it provides an Area under the Curve statistic (AuC), which represents the overall level of agreement between criterion (i.e. clinical diagnosis of ASD) and instrument (i.e. ADOS). The higher the AuC, the higher the probability that a randomly chosen participant with ASD will have a higher score on the instrument than a randomly chosen participant without ASD.

Correlations with Participant Characteristics

To investigate the relationship of domain scores with participant characteristics, we calculated bivariate correlations for the patient groups between domain scores, and age, IQ, and scores on the negative scale of the PANNS (schizophrenia only).

Results

Interrater Agreement

Interrater agreement at the level of ADOS classification was 81.7% with Cohen’s adjusted weighted kappa 0.66, which corresponds to good or substantial agreement (Landis and Koch 1977). When merging the ADOS-classifications AD and ASD (based on the proposed criteria for DSM V) the agreement increased to 89.2% with kappa 0.73. Intraclass correlations (ICC, Table 2) show high interrater agreement on SOC and COMSOC, and good agreement on COM. Mean agreement across the items was 81.7% with mean weighted kappa 0.66. Weighted kappa’s exceeded 0.60 for 14 out of the 21 items with the remainder exceeding 0.50.

Table 2 Intraclass correlations for interrater agreement

Internal Consistency

For the original algorithm, the internal consistency is high for SOC (Cronbach’s α. = 0.84), but rather low for COM (α = 0.52). This indicates that the items of that domain do not intercorrelate well in our population. Item A4 (Stereotyped Language) performed the worst and its deletion from COM increased alpha to an acceptable level (α = 0.60). The reorganization of communication and social interaction items in the SA domain of the revised algorithm creates a consistent domain (α = 0.87).

Comparison of Domain Means

Original Algorithm

All three domains and the total score showed a significant difference between the groups (Table 3). Tukey post-hoc comparisons show that for COM, SOC, and COMSOC, the ASD group scores significantly higher compared to the psychopathy group and the control group, but not compared to the schizophrenia group. The schizophrenia group scored significantly higher than the control group on COM, and higher than both the psychopathy and the control group on SOC and COMSOC. For RRB, the ASD group scored significantly higher than the control group, while there was a trend compared to the psychopathy group (p = .06). The forensic subgroup with ASD (n = 6) scored higher than the group with psychopathy on all domains (data not shown).

Table 3 Summary statistics based on the original and revised algorithms

Revised Algorithm

Both domains and the total score showed a significant difference between the groups (Table 3). Tukey post-hoc comparisons indicated that the ASD group scored significantly higher compared to the psychopathy group and the control group on SA, and there was a trend in comparison to the schizophrenia group (p = .06). The schizophrenia group scored significantly higher than the control group. For RRB, the ASD group again scored significantly higher than the psychopathy and control groups, but there was no significant difference with the schizophrenia group. For the total SARRB score, the ASD group scored significantly higher than the psychopathy, the control group, and the schizophrenia group, making it the only score for which the ASD group significantly differs from the schizophrenia group. The forensic subgroup with ASD (n = 6) scored higher than the group with psychopathy on all domains (data not shown).

Group Comparison at Item Level

The multivariate test showed that there was a significant main effect of group, F(66,210) = 1.688, p < .005. Results for the univariate tests are visually presented in Figure 1.

Fig. 1
figure 1

Between-group Comparisons at Item Level. Post-hoc comparisons of the ASD group versus the other three groups at item level (S schizophrenia, P psychopathy, TD typical development). Dark grey boxes filled with *** represent a statistically significant difference at p < .001. Middle grey boxes filled with ** represent a statistically significant difference at p < .01. Light grey boxes filled with * represent a statistically significant difference at p < .05. Unfilled light grey boxes represent a statistical trend (p < .1). In all these cases, the mean of the ASD group was higher compared to the respective group

Only four out of 22 items did not differ significantly between the groups. The majority of the remaining items showed a (almost) significant difference between the ASD group compared to the psychopathy and control groups, but not compared to the schizophrenia group. On some of these items the schizophrenia group also scored significantly higher than the psychopathy and/or control group: B2 (Facial Expressions), B6 (Empathy/Comments on Others’ Emotions), and B7 (Insight). Only three items distinguished the ASD from the schizophrenia group: A4 (Stereotyped Language), B10 (Quality of Social Response), and B12 (Overall Quality of Rapport). In addition, there was a trend for the ASD group to score higher than the schizophrenia group on item B11 (Amount of Reciprocal Social Communication, p = .07). Individuals with psychopathy scored comparable to the control group.

Criterion-Related Validity

The ADOS was able to correctly classify 74.2% of the cases in our sample as having ASD or not (based on the clinical diagnosis assigned). Logistic regression analysis showed that SOC (p < .005) but not COM (p = .27) made a significant contribution in predicting whether a participant in our sample had a clinical diagnosis in the autism spectrum or not (Table 4). The SARRB domain significantly contributed to prediction (Table 4, p < .005). The odds ratios presented in Table 4 indicate that augmenting scores of one point on SOC or SARRB, increase the probability that the individual has received a clinical diagnosis of ASD by 38 and 33%, respectively.

Table 4 Logistic regression analyses for criterion-related validity

ROC curves for the original and revised algorithms resulted in AuC values of .812 and .796, respectively (1 = perfect agreement). This indicates that in general the ADOS scores quite adequately predicted whether someone had a clinical diagnosis of ASD or not. Application of the standard cut-off for autism spectrum on the original algorithm (i.e. 7) gives only moderate sensitivity (0.61) but good specificity (0.82) in our sample. Lowering the threshold to 6 increases the sensitivity (0.68) and keeps the specificity at the same level (0.82). Lowering the threshold to 5 increases the sensitivity further (0.79), but it decreases the specificity (0.73). For the revised algorithm, a cut-off of 5 seems optimal in the current population with adequate sensitivity (0.71) and specificity (0.82).

Correlations with Participant Characteristics

There were no significant correlations between the domain scores, and IQ or age for the groups with ASD, schizophrenia, nor psychopathy (data not shown). In the group with schizophrenia, the presence of negative symptoms as measured by the PANNS correlated positively with SOC (r = 0.59, p < .05) but not COM (r = 0.12). The PANNS also correlated positively with SA (r = 0.66, p < .005). Thus, the more negative symptomatology an individual with schizophrenia had, the higher his scores on the ADOS. PANNS scores correlated in particular with items that are similar to negative symptoms, such as (flat) facial expressions (B2, r = 0.59, p < .05), (lack of) shared enjoyment (B4, r = 0.81, p < .01), (lack of) asking the examiner for information (A6, r = 0.66, p < .01), and (difficulty with) communication of own emotions (B5, r = 0.53, p < .05).

Discussion

Systematic instruments are needed that can facilitate the complicated diagnostic process concerning ASD in adults. The current study is the first that examined the psychometric properties of ADOS module 4 in an independent sample of high-functioning adult males with an established clinical ASD diagnosis compared to meaningful and relatively homogeneous clinical and non-clinical groups. Our findings show that ADOS module 4 is a reliable instrument. At all levels (i.e. classification, domains and items) raters obtained substantial agreement. In addition, ADOS module 4 has good general criterion-related validity. It is able to correctly classify the majority of individuals and higher scores on the ADOS predict a higher probability of having a clinical ASD diagnosis. The high Areas under the Curve are further indications that ADOS scores can predict whether an individual actually has an ASD. Furthermore, group comparisons between the ASD and other groups show that the ADOS is valuable in differentiating between ASD, and psychopathy and typical development. The distinction between psychopathy and ASD even holds when only taking into account forensic individuals with ASD (although the group size was rather small to perform such an analysis). The finding that ASD and psychopathy are so well-discriminated by means of ADOS scores is promising for forensic psychiatric settings.

Another finding is the similarity between ASD and schizophrenia with respect to ADOS scores. Clearly, individuals with schizophrenia and marked negative symptoms show behavior that is very similar to ASD (Frith and Happé 2005). Some patients with schizophrenia even have autistic-like symptoms that covary with negative symptoms (Sheitman et al. 2004). In line with these data, we show that the degree of negative symptomatology correlates significantly with ADOS scores, in particular with items resembling negative symptoms, such as (lack of) directed facial expressions and shared enjoyment. This resemblance makes it difficult for an observational instrument such as the ADOS to differentiate these groups on that behavior (see Reaven et al. 2008 for a similar finding in children with childhood-onset schizophrenia). The findings underscore previous recommendations of using a comprehensive assessment that incorporates information on daily functioning and early development with direct observation to reach a clinical diagnosis (Lord et al. 1999). Nevertheless, four items did show a difference between these groups: individuals with ASD use more stereotyped language, less reciprocal social communication, and display qualitatively poorer social responses and overall rapport. This suggests that core social items and stereotyped language discriminate individuals with ASD from those with schizophrenia.

Although findings are preliminary, the revised SARRB domain, which combines social, communication and repetitive behavior items, seems promising in this and other respects. It not only discriminates ASD from all other groups including schizophrenia, but also has high internal consistency, and does well in identifying ASDs: a higher score on this domain predicts a higher probability of a clinical ASD diagnosis with 33% per additional point. Another positive indication for the revised algorithm is the confirmation that stereotyped language fits better with the RRB factor than with the original communication domain. Notwithstanding the caution of interpreting ASDs in adults in exactly the same way as in children, the revised algorithm as developed for modules 1–3 seems promising for module 4 as well. More research is needed in a larger sample containing individuals with more severe autistic symptoms and lower levels of daily functioning to further investigate the revised algorithm.

A marked finding is the limited role of the original communication domain in the identification of ASDs in this sample. Despite group differences between ASD and psychopathy/typical development, the communication domain does not predict a clinical ASD diagnosis. Combined with its low internal consistency, the communication domain as such does not seem to add to the validity of ADOS module 4 in the current sample. However, when communication items are incorporated in the revised algorithm, a consistent scale (SA) emerges that is valuable in the diagnostic procedure for ASD. Similarly, although restricted and repetitive behaviors were rare in our ASD sample, their contribution to SARRB supports the distinction of ASD from schizophrenia. The relatively short duration of the ADOS interview naturally could have played a role in the paucity of RRBs (Lord et al. 1999). However, combining these two findings also fits the general clinical picture: in adolescents and adults with ASD there is a greater prevalence of impairment in non-verbal communication and social reciprocity than in verbal communication or repetitive behaviors and stereotyped interests (Shattuck et al. 2007). In fact, repetitive behaviors decline most strongly with age (Seltzer et al. 2003). Apart from ageing, individuals in our sample might have had relatively more intact verbal skills from the outset as they were all considered to be high-functioning. Stereotyped language, however, does differentiate the ASD group from all other groups in our sample. This may be typical of our high-functioning group, because idiosyncratic language and language complexity are positively associated (Volden and Lord 1991). Cultural differences in the use of gestures might also have played a role. Typically developing adults in our sample, for instance, used few emotional and only occasional descriptive gestures themselves.

The sensitivity in our sample was rather low (0.61), which means that not every individual with a clinical diagnosis of ASD obtained a concurrent classification on the ADOS. It is probable that the characteristics of our group played a role in this. Our sample consisted of high-functioning individuals that signed up for an extensive research project. They are probably situated at the milder end of the spectrum and some might have been able to (partly) compensate some behavior due to their high intelligence. Resulting relatively low scores make it difficult for the ADOS to identify these individuals. Our findings resemble the outcomes in ADOS modules 1–3, in which lower sensitivity (SE) was found for distinctions involving children with milder ASDs (module 3 by Lord et al. 2000, SE = 0.80, versus later studies: de Bildt et al. 2009, SE = 0.64; Gotham et al. 2008, SE = 0.49; Gotham et al. 2007, SE = 0.68). The high specificity (0.82), on the other hand, means that a positive ASD classification on the ADOS is a very strong indication for a clinician to consider diagnosing ASD. Sensitivity and specificity are tightly linked and the aim of the assessment determines which one is most important. High specificity is essential when one needs to be certain that the individuals selected actually have an ASD, for instance in autism research. High specificity can, however, lead to underinclusiveness. When the aim of the assessment is to screen for ASD, high sensitivity is crucial in order not to miss any potential case. For this purpose, lower thresholds could be considered at the expense of specificity. To prevent overinclusiveness, developmental history and current daily functioning should then be carefully reviewed. As this study included only a specific ASD group and specific control groups, further research is needed to establish the optimal cut-off points on the ADOS module 4.

This study has a number of limitations that should be taken into account when interpreting the results. First, compared to studies on the psychometric properties of modules 1–3 (de Bildt et al. 2009; Gotham et al. 2007, 2008; Oosterling et al. 2010), our study has a small sample size (n = 93). However, it is the first study examining module 4 in an adult population with ASD compared to specific and meaningful groups. Second, we are focused on high-functioning adult males with ASD, which means results cannot be generalized to the entire ASD population. Future studies on module 4 should comprise a larger sample, including individuals with lower levels of daily functioning, since the high-functioning character of our sample may have influenced the results. On the other hand, exactly these individuals are not always recognized during childhood. Therefore, increasing knowledge on module 4 seems most important for individuals showing milder autistic symptoms. In this light, it will also be important to include a group of high-functioning adult females, who run the risk of being undiagnosed because they might be especially good at compensating their behavior (Attwood 1999; In ‘t Velt-Simon Thomas and Mol 2008). Third, no standardized measures were available for the clinical diagnosis of ASD, which characterizes current practice in adult psychiatry. However, the normal clinical procedure included review of developmental history and current functioning and observation. In addition, most participants with ASD were recruited through a specialized centre. Fourth, we did use standardized measures to diagnose schizophrenia, but not to review early developmental history in this group. Therefore, we cannot eliminate the possibility that ASD was present before the onset of schizophrenia. However, this possibility is minimized by the fact that these individuals were extensively tested in a specialized psychosis centre and selected for this study by experienced clinicians. The control groups in the current sample were comparatively homogeneous and aimed to challenge the ADOS by comparing ASDs with other psychiatric groups with social deficits. For the investigation of ADOS’ value in differential diagnostics, examining different subtypes of schizophrenia and other diagnostic groups will be of great relevance as well (e.g. anxiety disorder, depression, ADHD, and OCD).

In summary, the ADOS module 4 is a reliable instrument that has good predictive value for ASD. It can adequately discriminate ASD from psychopathy and typical development in an adult population. With respect to schizophrenia, discrimination is more difficult due to behavioral overlap. These groups are, however, different on some core items. Although ADOS module 4 fails to classify ASD in a significant proportion of our higher functioning and more mildly affected ASD group, its ASD classification is a strong lead for a clinician to at least consider an ASD diagnosis. Explorative analyses of the revised algorithm indicate that a revision -in line with modules 1-3 and developments in criteria for ASD- could be beneficial for discriminating ASD from schizophrenia.