Introduction Between 0.02% and 0.04% of articles are retracted. We aim to: (a) describe the reasons for retraction of genetics articles and the time elapsed between the publication of an article and that of the retraction notice because of research misconduct (ie, fabrication, falsification, plagiarism); and (b) compare all these variables between retracted medical genetics (MG) and non-medical genetics (NMG) articles.
Methods All retracted genetics articles published between 1970 and 2018 were retrieved from the Retraction Watch database. The reasons for retraction were fabrication/falsification, plagiarism, duplication, unreliability, and authorship issues. Articles subject to investigation by company/institution, journal, US Office for Research Integrity or third party were also retrieved.
Results 1582 retracted genetics articles (MG, n=690; NMG, n=892) were identified . Research misconduct and duplication were involved in 33% and 24% of retracted papers, respectively; 37% were subject to investigation. Only 0.8% of articles involved both fabrication/falsification and plagiarism. In this century the incidence of both plagiarism and duplication increased statistically significantly in genetics retracted articles; conversely, fabrication/falsification was significantly reduced. Time to retraction due to scientific misconduct was statistically significantly shorter in the period 2006–2018 compared with 1970–2000. Fabrication/falsification was statistically significantly more common in NMG (28%) than in MG (19%) articles. MG articles were significantly more frequently investigated (45%) than NMG articles (31%). Time to retraction of articles due to fabrication/falsification was significantly shorter for MG (mean 4.7 years) than for NMG (mean 6.4 years) articles; no differences for plagiarism (mean 2.3 years) were found. The USA (mainly NMG articles) and China (mainly MG articles) accounted for the largest number of retracted articles.
Conclusion Genetics is a discipline with a high article retraction rate (estimated retraction rate 0.15%). Fabrication/falsification and plagiarism were almost mutually exclusive reasons for article retraction. Retracted MG articles were more frequently subject to investigation than NMG articles. Retracted articles due to fabrication/falsification required 2.0–2.8 times longer to retract than when plagiarism was involved.
- retraction notices
- non−medical genetics
- medical genetics
- research misconduct
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
- retraction notices
- non−medical genetics
- medical genetics
- research misconduct
Journal editors are responsible for maintaining the integrity of published articles. To this end, in addition to ensuring the standards of the editorial process before publication, they can publish corrections, expressions of concern and retractions of published articles. The latter concerns findings that result in loss of confidence in the study findings, in the conduct of the study or in the publication process of the manuscript. In biomedical and life sciences publications, retracted articles accounted for 0.02% of all published research articles indexed by PubMed as of May 2012.1 Research misconduct (ie, fabrication, falsification, plagiarism)2 accounted for 53%, whereas duplication and error were responsible for 14% and 21%, respectively; the USA was the country with the highest number of retracted papers due to research misconduct.1
In 2010 the Retraction Watch blog started to comment on retraction notices.3 A database was initiated to host all the information regarding retractions, corrections and expressions of concern. The Retraction Watch database (RWdb) has been used for research for some time, but it was only opened to the public in October 2018. As of 31 December 2018, the RWdb had hosted 18 649 retracted articles from all disciplines.4 It is considered to be the largest and most comprehensive database of retracted articles.5
Although genetics has experienced an unprecedented development in this century, it has a long history of scientific research. The RWdb has hosted retraction articles on genetics since 1977, starting with four retracted articles by RJ Gullis and CE Rowe—for yielding no reproducible results4—published in 1975. Genetics is the subject of research of any life science discipline. Along with biology, genetics could be the discipline that involves the greatest variety of researchers—with the implication of widely differing ethical approaches to research integrity. To the best of our knowledge, no research has been conducted that specifically aims to describe the main features of retracted genetics articles. Since genetics encompasses both medical-related disciplines and non-medical-related disciplines such as anthropology, biology or botany, a distinction between features of retracted medical genetics (MG) and retracted non-medical genetics (NMG) articles was warranted. The aims of this study were: (a) to describe the reasons for retraction of genetics articles and the time elapsed between the publication of the original article and that of the retraction notice (or time to retraction) because of research misconduct (ie, fabrication, falsification, plagiarism); and (b) to compare all these variables and the time needed for retraction of those cases of research misconduct investigated by the US Office of Research Integrity (ORI), between retracted MG and NMG articles.
Material and methods
On 14–16 January 2019 a search was conducted on the RWdb for articles published between 1 January 1970 and 31 December 2018. Boolean operators used were ‘OR’, ‘AND’ and ‘AND NOT’. The following descriptors were applied. Nature of notice: ‘retraction’. Article types: ‘case report’, ‘clinical study’, ‘commentary/editorial’, ‘letter’, ‘meta−analysis’, ‘research article’, and ‘review article’, using the Boolean operator ‘OR’. This strategy allowed the exclusion of other types of article such as, for example, ‘conference/abstract/paper’, ‘dissertation/thesis’, and ‘government publication’. Subjects: ‘genetics’; MG was searched as ‘genetics AND medicine’, whereas NMG was searched as ‘genetics AND NOT medicine’. Within the 31 subjects/specialties covered by ‘medicine’, the following were excluded: complementary medicine, dentistry and nursing. Then, as predefined by the RWdb, the following ‘reasons for retraction’ were searched: fabrication/falsification, plagiarism, duplication, unreliability, grave issues with authorship, results not reproducible, contamination of cell lines/materials/reagents, conflicts of interest, fake peer review, and error by journal/publisher. Of these ‘reasons for retraction’, only those containing a minimum of 50 cases in both datasets—MG and NMG—were considered, to allow for establishing reasonable comparisons. We also searched for those cases that were subject to civil or criminal proceedings, that provided limited or no information or that were subject to investigation (by ‘company/institution’, ‘journal/publisher’, ‘ORI’ or ‘third party’).
The following information was retrieved from each retracted article: date of publication of the original article and that of the retraction notice, PubMed ID (or DOI) of the original paper, and country (or countries) of origin of the investigating teams, with special interest in the seven countries (China, Germany, India, Japan, South Korea, UK, USA) with the largest number of biomedical and life sciences retracted papers1. All data were retrieved and checked by RDR; a random sample (by means of a random integer generator; https://www.random.org/) of 25% of all the data was checked for consistency by CA, and no discrepancies were found.
Retracted genetics articles
Descriptive statistics of the reasons for retraction and investigation, and time to retraction of research misconduct retracted articles, were calculated. To ascertain if there was any temporal trend in the reasons for retraction and in the time to retraction, logistic and linear regression analyses were performed by dividing the total study period into five time periods: 1970–2000, 2001–2005, 2006–2010, 2011–2015, and 2016–2018. This allowed comparisons to be made between the last four periods of time considered and the first one (1970–2000), which was used as a reference. In the logistic regression analysis, results obtained for the last period (2016–2018) should be taken with caution because the time for retraction was too short in comparison with those of the three previous time periods considered; similarly, in the linear regression analysis, results obtained for some time periods should be viewed with caution since the number of cases was too small in comparison with the other time periods considered. All tests were two-sided; p<0.05 was considered to be statistically significant.
Retracted MG and NMG articles
The descriptive statistics of the reasons for retraction and investigation, and time to retraction of research misconduct articles of both MG and NMG, were calculated. Odds ratios were calculated in each of the five study time periods (1970–2000, 2001–2005, 2006–2010, 2011–2015, 2016–2018). χ2 test was used to ascertain if differences in the percentage of the reasons for retraction and investigation existed between both datasets. Bonferroni correction was made for multiple testing. All tests were two-sided; p<0.004 was considered statistically significant.
Mann-Whitney U test was used to compare time to retraction of MG and NMG articles where fabrication/falsification or plagiarism were involved, over the five time periods. Bonferroni correction was made for multiple testing. All tests were two-sided; p<0.0125 was considered statistically significant.
Finally, the descriptive statistics of time to retraction of investigations conducted by the ORI for both retracted MG and NMG articles were calculated and a comparison between them was performed using the Mann-Whitney U test. The test was two-sided; p<0.05 was considered statistically significant.
Statistical analyses were performed using R.3.5.1 (R foundation for Statistical Computing, Vienna, Austria).
Retracted genetics articles
Following our search strategy (online supplementary information−1), there were 1584 retracted articles on genetics; 692 (44%) on MG—although two were excluded for being on complementary medicine and dentistry—and 892 (56%) on NMG. Table 1 shows all the reasons for retraction and investigation for all retracted genetics articles (online supplementary information−2). Research misconduct and duplication were involved in 33.2% (525/1582) and 24.3% of retracted articles, respectively. Thirty-seven per cent (587/1582) of genetic retracted articles were subject to investigation. Of the 10 reasons for retraction, only five met the minimum number of 50 article retractions in both datasets that allowed for inclusion in the analysis. Among the seven countries assessed, and related to the five reasons for retraction and investigation, the USA (n=526) and China (n=509) were the origin of the authors of the majority of retracted articles; South Korea (n=64) and the UK (n=69) were the countries with the lowest number of retracted papers
Temporal trends by means of logistic regression analysis showed that fabrication/falsification was a statistically significantly (p≤0.007) less frequent reason for retraction in 2011–2018 than in the reference period (1970–2000) (online supplementary information−3). Conversely, plagiarism and duplication were statistically significantly (p between 0.024 and <0.001) more frequent reasons for retraction during 2006–2018 and 2001–2018 than in 1970–2000, respectively. Temporal trends analysis showed that when fabrication/falsification or plagiarism were involved the time for retraction was statistically significantly (p<0.001) shorter in 2006–2018 and 2001–2018, respectively, than in the reference period (1970–2000), with shorter times as the 21st century progressed.
Retracted medical genetics and non−medical genetics articles (online Supplementary information-4)
Table 2A (online supplementary information−5) shows the five reasons for retraction and investigation of all retracted MG and NMG articles and the number of cases that originated in the seven countries of interest, divided into six time periods. Between 5% (NMG) and 7% (MG) of retraction notices provided limited or no information. Research misconduct (ie, fabrication/falsification, plagiarism) was the reason for retraction in 28.9% and 36.5% of MG and NMG articles, respectively. Only one (out of 199) MG and three (out of 326) NMG retracted articles had both fabrication/falsification and plagiarism issues; thus, these two reasons for retraction were (almost) mutually exclusive. Fabrication/falsification was statistically significantly more frequently involved in NMG (27.9%) than in MG (18.6%) retracted articles. Conversely, MG retracted articles (45.2%) were statistically significantly more frequently investigated than NMG retracted reports (30.8%). There were statistically significantly more investigations conducted on retracted MG than on NMG articles in 2011−2015. China led the ranking in MG and the USA in NMG articles (table 2B). supplementary information−4
Time to retraction when research misconduct was involved is shown in table 3. With fabrication/falsification, time to retraction was statistically significantly (p=0.001) longer in NMG (mean 6.4; median 5.4 years) than in MG (mean 4.7; median 4.0 years) retracted articles for the whole study period (1970–2018). The time needed for retraction when plagiarism was involved was very similar between MG and NMG—both with means of 2.3 and medians of 1.3 years—and no statistically significant differences were found in any time period assessed except in 2016–2018. The USA was the leading country for authors of both retracted MG (n=133) and NMG (n=59) articles for fabrication/falsification, with China leading retracted MG (n=36) and NMG (n=34) articles due to plagiarism. There were several authors with a high number of retractions that affected the overall number of retractions in specific periods of time. Thus, Fazlul H Sarkar with 11 retractions and Jin Q Cheng with eight retractions represented 22% of all 86 retractions in NMG in 2006−2010; in this period Naoki Mori with eight retractions represented 15% of all retracted MG papers.
Of 207 investigations conducted by ORI on articles published during the whole study period, 62 (30%) were on genetics, with a similar percentage of retracted MG (n=29; 4.2%, 29/690) and NMG (n=33; 3.7%, 33/892) articles; the time to retraction (the time elapsed between the publication of the original article and that of the retraction notice) was not statistically significantly different (means of 4.6 and 5.1 years, medians 2.7 and 5.0, for MG and NMG, respectively). Fabrication/falsification was involved in all ORI investigated retracted articles, except in one NMG article that had unreliable results. Company/institution investigation was conducted in 17 (58.6%) of MG and 27 (81.8%) of NMG articles that were also subject to ORI investigations. Fabrication/falsification of data, data and images, and images were involved in 30 (48.4%), 17 (27.4%), and nine (14.5%) retracted articles, respectively. Duplication of images was also involved in eight retracted articles (four MG, four NMG). All 62 retracted articles were by authors from the USA; only three MG and eight NMG articles had authors from six other countries.
Finally, table 4 shows the seven cases that were subject to civil or criminal proceedings in four different countries. Time elapsed from publication of the article to publication of the retraction notice ranged from 1.7 to 13.6 years.
The RWdb listed 1582 retracted papers on genetics. Between 1996 and 2017 there were some 975 000 worldwide genetics articles published6; in this same period, the RWdb listed 1476 retracted papers on genetics. Hence, we can estimate that the retraction rate for genetics articles was around 0.15%, almost four times higher than the current rate found on the RWdb for all disciplines (0.04%)5 and almost eight times higher than that found for PubMed articles (0.02%).1 Therefore, we should assume that genetics is a discipline with a high retraction rate.
Research misconduct was involved in 33% of all retracted genetics articles, which compares favourably with 43% for oncology,7 46% for all retracted articles in 2013–2015 hosted on the RWdb8 and 49% for retracted articles in open access journals,9 but not with 26% among BMC journals.10 In genetics, this study showed that fabrication/falsification was much more frequent than plagiarism (24% vs 9%), a situation also found in cancer (28% vs 14%),7 but the opposite of what happens with retracted articles in open access journals.9 10 It is important to highlight that fabrication/falsification and plagiarisms are almost mutually exclusive: in only 0.8% (4/525) of all retracted articles were both these reasons present. Duplication, the single most prevalent reason involved in retracted genetics papers (24%), is less commonly found in oncology (18%)7 or retracted articles in open access journals (12–16%).9 10 It is remarkable that almost four out of 10 of all retracted genetics articles were subject to investigation. As regards to the country of origin of the authors, the seven countries assessed in our study also produced the majority of retracted articles in other analyses, although Italy, Taiwan and Iran were in the list of countries with the highest number of retractions in other analyses.7−9
This study showed that among the reasons for retraction involving research misconduct, fabrication/falsification was significantly reduced in 2011–2018 with respect to the reference period (1970–2000); conversely, plagiarism was a significantly more frequent reason for retraction in 2006–2018. Time to retraction when research misconduct reasons were involved was statistically significantly shorter in 2006–2018—which could reflect, at least in part, the higher focus of the scientific community on research integrity in recent years.
Duplication was also a statistically significantly more frequent reason for retraction during this century, than in 1970–2000. This could underlie the fact that although image editing software facilitates manipulation of images by authors, it also eases the task of editorial teams to detect it.11 Digital manipulation of images, however, has very different degrees of severity, the most serious being the deliberate alteration of the interpretation of data.12 Duplication is of such unprecedented magnitude in all scientific disciplines that it has been estimated some 35 000 articles might be candidates for retraction because of image duplication.13
The comparison between retracted MG and NMG articles showed that of the five reasons for retraction assessed, NMG had statistically significantly higher percentages of fabrication/falsification than MG retracted articles. A highly statistically significant difference was found between the percentages of retracted articles that were subject to investigation (by company/institution, journal, third party or by the ORI) in MG (45%) and NMG (31%) articles—this was due to the statistically significant differences observed in the 2011–2018 period. This strongly suggests that alleged misbehaviour in MG articles attracted more interest and was more stringently assessed than in NMG articles, perhaps reflecting the higher societal relevance of MG rather than NMG articles. The difference observed in the percentages of retracted articles that were investigated was due to the higher interest of institutions/journals/third parties—which likely were more sensitive to research integrity issues in MG articles—rather than the ORI, since this latter body conducted a similar percentage of investigations (4.2% in MG, 3.7% in NMG). However, to put these figures into perspective, we must highlight the fact that the ORI may accept the institution’s findings, ask for additional evidence or conduct its own investigation.14
Temporal trends of retracted MG and NMG articles throughout the study period showed that time from publication to article retraction when fabrication/falsification was involved was statistically significantly longer for NMG (mean 6.4 years) than MG (mean 4.7 years) articles. This was statistically significant only for the period 2001–2010. There were no differences when plagiarism was involved (mean of 2.3 years for both MG and NMG articles).
This is the first study on retraction notices for genetics articles. The strengths and limitations of this analysis have the same source—the data contained in the RWdb. The RWdb holds the largest number of retracted articles.5 This is a strength, since it has been shown that both PubMed and Web of Science—commonly used for studies on retractions—are inconsistent in regard to how retractions and retracted articles are labelled,15 and therefore many could be lost for analysis. In addition, the RWdb uses a common taxonomy, which is of invaluable help when dealing with retraction notices from various editors/authors published with different formats, clarity and level of detail (reason for retraction such as ‘euphemism for duplication’ or ‘euphemism for plagiarism’ help to appropriately classify certain cases). On the other hand, the RWdb lists only a (very large) subset of all retracted articles, and therefore some retractions may have been missed. Furthermore, a number of articles could have been more precisely classified. In any case, we should acknowledge that misclassification of papers is common even at journal level.16 In addition, and to put our findings into perspective, it should be highlighted that not all articles that should be retracted are reported as such with a retraction notice by the journal. A recent study has shown that among authors of 200 articles who were found guilty of research misconduct after ORI investigations, 41 (20.5%) articles did not have a retraction notice published.17
This study shows that genetics is a discipline with a relatively high retraction rate. Journal editors should have a proactive attitude toward maintaining the integrity of the published record. In addition to the available online and offline tools for detecting plagiarism, new tools are being developed to help editors to discover certain types of research misbehaviour. One of them is automated software to detect duplications, whose authors estimate that 0.6% of papers contain fraudulent images.18 Another is a program (the ‘Seek & Blastn’ tool) that spots incorrect nucleotide sequence reagents reported in articles that could invalidate the results.19 This system also allows the detection of errors unrelated to misconduct that, however, could have an important negative impact on the reliability of genetics results. As it has been recently validated,20 it could be incorporated into the editorial process and help editors in their article screening process which could lead to retractions—as has already happened in 17 cases.21 Although few editors currently do this with image data,22 23 these new programs should also be used in the manuscript editorial process to prevent the publication of articles with duplicated images or mismatched gene sequences. Implementing processes to prevent publication of flawed research is always better than retracting it years later.
We thank Ignacio Mahillo PhD (Health Research Institute-Fundación Jiménez Díaz, University Hospital, Universidad Autónoma de Madrid, Madrid, Spain) for conducting the statistical analyses. We thank the University Chair UAM-IIS-FJD of Genomic Medicine for funding the open access publication charges of this paper.
Correction notice The article has been corrected since it was published Online First. Fazlul Sarkar's name has been corrected in table 3 and in the text of the article.
Contributors RDR conceived the study, retrieved the data and drafted the manuscript. CA checked 25% of the data. Both authors analysed and interpreted the data. CA provided comments and edits throughout the drafting process for important intellectual content. Both authors approved the final version of the manuscript and are accountable for all aspects included in it. RDR is the guarantor of the article.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement All data relevant to the study are included in the article or uploaded as supplementary information.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.