Article Text
Abstract
In the last two decades predictive testing programs have become available for various hereditary diseases, often accompanied by followup studies on the psychological effects of test outcomes. The aim of this systematic literature review is to describe and evaluate the statistical methods that were used in these followup studies. A literature search revealed 40 longitudinal quantitative studies that met the selection criteria for the review. Fifteen studies (38%) applied adequate statistical methods. The majority, 25 studies, applied less suitable statistical techniques. Nine studies (23%) did not report on dropout rate, and 18 studies provided no characteristics of the dropouts. Thirteen out of 22 studies that should have provided data on missing values, actually reported on the missing values. It is concluded that many studies could have yielded more and better results if more appropriate methodology had been used.
 dropout
 genetic testing
 psychological adjustment
 statistic methodology
Statistics from Altmetric.com
Fifteen years ago, predictive genetic testing became available for hereditary diseases with onset later in life. Since then, quantitative and qualitative psychological followup studies have shown that predictive testing offers several benefits. Although most psychological research on genetic testing has focused on the tests for Huntington’s disease (HD) and the BRCA1 and BRCA2 mutations, clinical experience has provided evidence that the findings can be extrapolated to other diseases with similar inheritance patterns. Research included the characteristics of tested individuals, why they make certain choices, their understanding of the information conveyed by the test, and how they adjust after the test results. Generally, people are able to make informed decisions and they can cope with the test results. However, little is known about those who did not apply for predictive testing, or those who withdrew from the followup studies.^{1} The latter individuals who drop out of studies may bias the findings. Obviously, careful statistical modelling is required to provide reliable conclusions, but also to make optimal use of the data. This has led to growing interest in the methodological quality of the studies. Although sophisticated software enables almost everyone to carry out the most exotic of statistical and epidemiological analyses, the methodological implications are too often incompletely understood.^{2} This review aims to investigate the statistical methods used in the studies of psychological effects of DNA testing for genetic diseases.
METHODS
Search methods and inclusion criteria
First, the reviews of Broadstock et al,^{3} Meiser and Dunn,^{4} and Duisterhof et al^{5} provided insight into the studies that have been published. In addition, we used the databases MEDLINE and PsycLIT from 1988 onwards. The key words used for the searches were: Huntington* disease, HBOC, BRCA*, FAP, familial adenomatous polyposis, HNPCC, colon cancer, SCA, or spinocerebellar ataxia.
Followup studies encompass the prediction of psychological adjustment or the course of adjustment over time. Firstly, studies are included if they analyse quantified measures statistically. Secondly, studies should have investigated the psychological effects of a genetic test outcome for hereditary diseases. Thirdly, studies should have a longitudinal, not a crosssectional or retrospective design.
Categorisation strategy for longitudinal research methods
This review categorises and discusses articles according to the longitudinal methods that are used. Methods that could be valuable for longitudinal research such as structural equation models were not observed and hence are not discussed. For a more complete overview of longitudinal research we refer to Bijleveld et al.^{6} Statistical methods can be classified as less adequate for two reasons. First, when the measurement level of the variable does not correspond with the level that is required for the method that is used. Second, when the study includes more than three waves, but the method used is only suitable for two waves at most. We distinguished in this review five issues of interest: (1) the measurement level of the variables, (2) the number of measurement moments (waves), (3) dropout handling, (4) missing value handling, and (5) the groups and subgroups that are studied. For classifying the studies we considered the measurement level (continuous or discrete) and the number of waves (two wave or multiple wave studies).
A longitudinal effect study must contain a pretest baseline measure, and one or more posttest measures. Depending on the number of waves, several effects can emerge over the course of time. With two waves, a linear effect can be found. With multiple waves a quadratic effect can be found, which implies that the linear effect between baseline and first posttest followup is not continued at a later followup. In studies with more than three waves, cubic effects can be found, analogous to a third power polynomial. We will categorise firstly according to measurement level: continuous and discrete; and secondly according to longitudinal capabilities: higher order time effects, only linear time effects, and no time effects.
Dropout analyses and reporting
Dropout, caused by subjects who do not return for followup measurements, is a serious problem in virtually all longitudinal research. Dropout can invalidate the findings of a study when dropouts have characteristics or psychological outcomes different from the persons who remain in the study.
Missing value handling and reporting
Related to the dropout problem, but different in nature, is the handling of missing measurement points. Dropouts are subjects who participated at the start of the study, but do not return for followup after a certain time point, for instance after the disclosure of a test result. We refer to missing values when subjects did not complete questionnaires at a certain time point, but did return for followup at other time points. It should be noted that this problem emerges only in longitudinal studies with multiple waves.
RESULTS
The search was carried out in November 2003. The first search resulted in 41 840 references. This search was then combined with the keywords: genetic* and psychol*, which decreased the number of references to 585. From the abstracts it was concluded that 86 of these could be useful for this study.
Fortysix articles did not meet our inclusion criteria as:

fourteen were qualitative studies. Although some of these studies used quantitative measures, the data were not statistically analysed and thus the studies were classified as qualitative for our purpose.^{7–}^{20}

ten studied psychological effects, but not of a genetic test outcome.^{21–}^{30}

thirteen studies were not longitudinal but crosssectional or retrospective.^{31–}^{43}

three did not investigate psychological outcomes of genetic testing.^{44–}^{46}

six were reviews on effects of (genetic) testing.^{3–}^{5,}^{47–}^{49}
Studies that could be classified in more than one category were mentioned in the first of the exclusion criteria involved. Forty articles met the inclusion criteria.^{1,}^{50–}^{87} (table 1)
Categorisation of methods
Methods for continuous outcome variables and three or more waves
General linear mixed models (GLMM) is also referred to as random effect modelling, random coefficient regression modelling, mixed models, or multilevel regression analysis. This method, used in five studies,^{1,}^{60,}^{82,}^{84,}^{85} has three advantages: (1) incomplete cases can be analysed, (2) different time spans for individuals between the waves can be handled, and (3) the method allows control for confounding variables, for example age or the number of children, by entering these as covariates into the model equation. Missing data that are dependent on the observed outcome variables and other observed characteristics can be dealt with by including these in the analysis.^{88,}^{89}
Repeated measures analysis of variance
Eight studies used repeated measures analysis of variance on continuous data with multiple waves.^{51,}^{52,}^{54,}^{64,}^{65,}^{75,}^{81,}^{86} Repeated measures analysis of variance has three main advantages: (1) it is relatively easy to perform, (2) several outcome variables can be analysed simultaneously, and (3) confounding variables can be included as covariates. The disadvantage, that only complete cases can be analysed, can be reduced by imputing the missing values of incomplete cases. A second, but less serious problem is that the time spans between the waves must be equal for each participant.
Five studies did not report on quadratic or higher order time effects, which suggests that they have not made optimal use of repeated measures analysis.^{51,}^{54,}^{75,}^{81,}^{86} Three studies have used missing value imputation,^{51,}^{65,}^{81} which is discussed in the section on missing data. One study^{90} used SPSS MANOVA on discrete variables, which must be considered as less adequate.
Methods for continuous outcome variables and two waves
Repeated measures analysis of variance
Three studies^{55,}^{77,}^{80} used this highly appropriate method for analyses on two waves.
Regression analysis, (multiple) linear regression analysis, sequential or hierarchical regression analysis
The followup outcome variable is defined as the dependent variable in the regression equation. The baseline scores, DNA test outcome, and other variables (gender, age, education) are defined as independent variables. Two studies^{58,}^{67} used this method adequately on two waves. Four studies^{61,}^{62,}^{69,}^{73} did not include the baseline measure in the analysis, which is less adequate when there are baseline differences. One of these^{62} did not include the baseline measure because of missing baseline data. Four studies^{56,}^{71,}^{79,}^{83} used this analysis for multiple waves, which is less adequate, because only two by two comparisons can be made.
Analysis of variance or covariance of change scores
In principle this method is the same as regression analysis. Two studies^{76,}^{83} used it to analyse multiple waves, which is not optimal. One of these^{83} did not use the baseline score as a covariate, and differences were analysed at baseline in separate ttests.
Logistic regression
By using the DNA test outcome as the dependent variable, logistic regression can be used for analysing continuous outcome variables. Baseline, followup scores, and confounding variables can be entered as independent variables. This method is unconventional since the role of the determinant (DNA test outcome) and the outcome (psychological test) are interchanged. The advantage of this method is that very few requirements are posed on the independent variables. Two studies^{68,}^{87} used logistic regression in this way for their multiple wave study.
Paired samples ttest
With two time points, the paired samples ttest yields the same solution as a repeated measure ANOVA. However, no comparisons for change in course of time between groups can be made. One study^{72} used it for two time points, and they analysed the differences between groups separately. It is better to use one integrative method, so that interaction effects can be revealed. Two studies^{56,}^{71} used this technique less adequately for multiple waves and for more than one group.
Methods for continuous outcome variables that are not longitudinal
ttest for independent samples
Brandt et al^{50} used this method for their study comprising of eight waves. Probably because the sample size was relatively small compared to the number of waves, repeated measures analysis of variance would have yielded invalid results. As more advanced methods were not common in 1989, there is no reason to object to this method. Lawson et al^{57} used this test to analyse baseline characteristics of persons who had an adverse event after the prediagnostic test for HD.
Methods for discrete outcome variables and three or more waves
Friedman’s test for ordinal data
This test is regarded as the nonparametric equivalent of a one sample repeated measures design. It neither reveals differences between groups, nor can it reveal quadratic, cubic, or higher order time effects. If a significant time effect within a group is revealed, pairwise comparisons between waves must be analysed separately for significance. Two studies used this procedure^{70,}^{78} for their continuous data. This could be adequate if the variables could not be successfully transformed to normality. Neither study reported on the distribution of the variables.
Methods for discrete outcome variables and two waves
Logistic regression
Logistic regression is an appropriate method of analysis when the outcome variable is dichotomous. One study^{66} dichotomised the continuous outcome variable, and performed logistic regression analysis which is less efficient. The authors also report that measures were taken at three time points, but they barely touched on the third wave in the result section.
Wilcoxon signed ranks test in combination with Wilcoxon rank sum test
These are nonparametric equivalents to a paired samples ttest and a ttest for independent samples. One study^{74} used these tests for one group and two waves, on continuous variables that were not normally distributed. This is a reasonable alternative when variables cannot be transformed to normality, though no interactions can be analysed.
Methods for discrete outcome variables that are not longitudinal
KruskalWallis H test
This is a nonparametric equivalent to oneway ANOVA. One study^{59} seem to have used this test to compare three groups with respect to differences between the followup measure and baseline. Analyses for each of the three followup measurements were performed separately.
MannWhitney U test
This test is equivalent to a KruskalWallis H test, but restricted to comparing only two groups. One study^{53} used this test on continuous data for multiple waves because of a small sample size. Carriers were compared with noncarriers at each time point separately. Another study^{57} used this test to analyse baseline characteristics of persons who had an adverse event after the prediagnostic test for HD.
Fisher’s exact test
One study^{63} used Fisher’s exact test for analysing the difference between the number of people who had an increase and those who had a decrease since baseline with regard to certain outcome variables. When continuous variables are treated in this way, much information may be lost.
Dropout analysis and reporting
In this review we differentiate between individuals lost to followup (that is, dropouts) and those for whom data are incomplete as a consequence of missing time points. Incompleteness of data within questionnaires because participants did not answer all questions is not discussed here. In general, questionnaire manuals provide rules for handling this problem. Moreover, this is not a specific issue of longitudinal designs.
We divided the studies into four groups: (1) baseline differences analysed and found, (2) baseline differences analysed but not found, (3) dropout rate reported, but no analysis for differences reported, and (4) no mention of dropout (table 2):

thirteen studies reported differences between dropouts and participants who returned for followup questionnaires.^{1,}^{53,}^{58,}^{63,}^{65,}^{66,}^{68,}^{78,}^{83–}^{86,}^{90}

ten studies reported that differences between dropouts and persons participating in followup were analysed, but no differences were found.^{51,}^{60,}^{64,}^{71,}^{75–}^{77,}^{81,}^{82,}^{87}

eight studies reported the dropout rate, but did not analyse possible characteristics of dropouts.^{50,}^{54,}^{56,}^{59,}^{61,}^{67,}^{70,}^{72}

nine studies reported neither on dropout analyses, nor on dropout rate.^{52,}^{55,}^{62,}^{69,}^{73,}^{74,}^{79} One study^{57} seemed to claim that there were no dropouts at all, though from the text it can be inferred that there must have been between one and 18 dropouts. And one study^{80} reported that unfortunately no baseline records of dropouts were kept.
Missing value handling and reporting
Twentyeight studies included three or more waves, which made these studies vulnerable to missing values. Five studies used GLMM for analysis, and one study performed no longitudinal analysis.^{57} From the remaining 22 studies information is needed about how they dealt with missing values. Three studies imputed missing values before performing repeated measures analysis of variance. One^{77} used singular regression imputation, which is considered inferior to multiple imputation,^{88,}^{91} one^{65} used mean substitution, which is generally insufficient, and one^{51} did not report which method they used. In 10 studies participants with missing time points were excluded from the analyses^{54,}^{56,}^{66,}^{68,}^{70,}^{71,}^{75,}^{78,}^{86,}^{90} and nine studies did not report on the handling of missing time points at all.^{50,}^{52,}^{53,}^{59,}^{64,}^{76,}^{79,}^{83,}^{87}
The groups and subgroups that are studied
In 35 studies carriers were compared to noncarriers. Several of these studies also included other groups: people with an uninformative test outcome,^{50,}^{51,}^{57,}^{82,}^{83,}^{85} people who refrained from testing,^{57,}^{58,}^{66} partners,^{1,}^{64,}^{81} parents of individuals tested for FAP,^{85} and people who had had a spinocerebellar attack.^{59} One study compared unaffected LiFraumeni, unaffected BRCA1 tested individuals and women who were carriers of BRCA1 mutations.^{71} One study^{68} compared high and average risk groups of BRCA 1/2 mutation negatives. One study^{72} compared parents with children tested for the MEN2 gene: all children positive, all negative, and mixed. Two studies included only one group.^{70,}^{73}
Accuracy of reporting
Some studies reported in an incomplete or unclear fashion. Sometimes the size of the study group and inclusion criteria remained unclear, or no actual pvalues were reported. In other studies the presented pvalues were different from values that could be calculated from the tables. Sometimes, the number of participants inferred from df or χ^{2}values was not in accordance with the reported number of participants.
DISCUSSION
The aim of this review was to describe the methodology and statistics of psychological followup studies on effects of predictive genetic testing. Fifteen studies were found to have applied more or less adequate statistical methods. The majority of the studies, however, applied statistical techniques that were less suitable or less efficient for the data that were available to the researchers. We evaluated studies on five issues: (1) the measurement level of the variables, (2) the number of waves, (3) dropout handling, (4) missing value handling, and (5) the groups and subgroups that were studied.
The measurement level of the variables
Generally, most studies used variables with a continuous measurement level. Many statistical methods that are appropriate for this measurement level require normal distribution. If not, an attempt can be made to transform it to normality.^{92,}^{93} If transformation is not successful, a nonparametric test should be used, as is prescribed for discrete variables. Generally, parametric tests are more efficient than nonparametric tests.^{94} For this reason it is recommended that a parametric test is used whenever permitted. Thirtyone studies used a parametric test on continuous data, seven studies used a nonparametric test on continuous data, one study used a parametric test on discrete data, and one study did not perform any longitudinal analysis.
Dropout analyses and reporting
Eighteen studies gave no evidence of having performed any analysis on the characteristics of dropouts. These differences should include outcome variables and all biographical measures that have been assessed. We favour the suggestion of Moher^{95} who reports that a flow diagram should be provided with the number of participants in any condition and any moment, and that reasons for these numbers should be given. Only two studies in this review actually provided such a flow chart.^{58,}^{90}
Sample size
An important characteristic of a study is the number of participants. The costs and efforts needed to conduct a large sample study will be higher than a small sample study. Obviously, a study with a large sample will reveal more (significant) effects. Although sample size issues are not within the scope of this review, it should be noted that a minimum sample size is needed to perform quantitative analyses. Our scope is to review the effectiveness and correctness of the methodology used, which is an issue independent of the sample size.
A model for study design and analysis
The type of study for examining the effects of testing for late onset genetic diseases is a longitudinal design in which one or more posttest measures are compared with a pretest measure. Admitted, there is not one ideal study design as it depends on the resources the researcher has access to, the subjects who can and want to participate, and the questions that are to be answered. We provide some recommendations. The study should include a baseline measure, an intervention such as a genetic test outcome, and followup measures. Generally it is concluded from previous research^{5} that test results have a large impact directly after the test, but measures stabilise some time after baseline measurement. Timman et al^{1} suggested that the genetic test outcome does have long term effects. For this reason it is recommended that a study be continued over several years. For an initial study report, for example when baseline and the first followup are undertaken, analysis can be done with repeated measures analysis of variance. For reports on subsequent followups, GLMM is recommended. Although GLMM can handle incomplete cases, all efforts should be made to avoid missing values and retain as many participants in the study as possible.
In many articles no rationale was given for the analytical approach used. Sometimes the method used for the analyses was not clearly described, and we had to infer the analysis from the reported results. In some cases this may have meant that a different procedure was used than we inferred.
The use of an inadequate method can result in incorrect conclusions, but more often it can result in a failure to find a significant effect. To determine whether studies would have produced different findings if a more adequate method had been used, one needs to reanalyse the data. In a number of circumstances we performed the method described on our own data, and we compared the results of the various analyses. It is beyond the scope of this article to report on this extensively. Often the use of a nonparametric test where a parametric test could be used, does not lead to a dramatic loss of power efficiency. Mostly the power of a nonparametric test is about 95% compared to an Ftest when conditions for this test are met. In some circumstances, however, for example when distributions are dichotomised before analysing, this power can drop dramatically to a lower 63%.^{94}
In this review, we found a number of studies that used sound methods and reported their findings and dropout handling in an excellent way. Our purpose is to present these methods and ways of reporting to all researchers in the field of psychological effects of genetic testing. Using an inadequate technique can cause loss of information, for example when a technique excludes incomplete cases. If more uptodate and sophisticated techniques are included in one’s statistical package, these should be used. In hindsight it can be said that some of our own new findings were already present in our previous data, but the statistical packages available to us at the time did not reveal these. We hope that this study can be of help to other researchers for finding more, and better founded results.
Acknowledgments
We thank Vicky Kraver for correcting our use of the English language.
REFERENCES
Footnotes

Conflict of interest: none declared.