A call for mtDNA data quality control in forensic science

https://doi.org/10.1016/j.forsciint.2003.12.004Get rights and content

Abstract

There is increasing evidence that many of the mitochondrial DNA (mtDNA) databases published in the fields of forensic science and molecular anthropology are flawed. An a posteriori phylogenetic analysis of the sequences could help to eliminate most of the errors and thus greatly improve data quality. However, previously published caveats and recommendations along these lines were not yet picked up by all researchers. Here we call for stringent quality control of mtDNA data by haplogroup-directed database comparisons. We take some problematic databases of East Asian mtDNAs, published in the Journal of Forensic Sciences and Forensic Science International, as examples to demonstrate the process of pinpointing obvious errors. Our results show that data sets are not only notoriously plagued by base shifts and artificial recombination but also by lab-specific phantom mutations, especially in the second hypervariable region (HVR-II).

Introduction

DNA typing is the most important advance in forensic science and is very useful in criminal prosecutions. Among the DNA markers (loci) employed in the field, mitochondrial DNA (mtDNA) is chosen for its specific characteristics, such as maternal inheritance, absence of recombination, high divergence rate, and high copy number per cell. MtDNA databases of various populations offer valuable information for estimating a chance matching probability when a forensic stain and a suspect share a sequence as well as for inferring (sub)continental origin of an mtDNA lineage. Unfortunately, many published mtDNA data are not sufficiently reliable [1], [2], [3], thus strongly limiting their forensic use.

There are five major and common types of errors, namely, base shifts, reference bias, phantom mutations, base misscoring, and artificial recombination, observed in published mtDNA control region sequence data [1]. These errors can be detected by phylogenetic analysis and comparison with closely related sequences from other databases. This approach has been used to pinpoint errors in mtDNA control region [2], coding region [4], [5], [6] as well as in ancient DNA data [7], which led to a number of caveats for data generation and compilation. However, many researchers do not perform sufficiently stringent a posteriori quality control for their data. The recent report of 105 Chinese Han mtDNA control region sequences [8] constitutes a case in this regard. In what follows, we took this data set as well as the data from Tsai et al. [9] and Koyama et al. [10] for exemplifying the phylogenetic strategy of pinpointing potential errors in mtDNA sequences.

Section snippets

Haplogroups

The East Asian mtDNA phylogeny is now quite well worked out [11], [12], [13], and this reference system has recently entered the forensic field as well [14]. The basal mutations on each branch of the phylogeny define an mtDNA haplogroup, with a more or less restricted geographic distribution. Normally, the haplogroup-specific mutation(s) of one haplogroup do not co-occur with the haplogroup-specific mutation(s) of another haplogroup, except for singular recurrent events. If, in a sequence

Haplogrouping

Tables 1 and 2 of Rao et al. [8] compile 105 Chinese Han mtDNA lineages based on HVR-I and HVR-II typing. More than 95% of these mtDNA lineages can actually be allocated to specific mtDNA haplogroups according to their mutation motifs (Table 1). For instance, since the sequence 16185-16223-16260-16298-73-152-249d-263 (transitions and indels relative to the revised Cambridge reference sequence [28]; length polymorphisms of C-stretches disregarded) is inferred to be the ancestral sequence of

Conclusion

In view of an increasing number of forensic labs that start to work on mtDNA, we feel it urgent to reiterate the need for data quality control. We entreat new mtDNA researchers to check for the sources of error summarized in Bandelt et al. [1]. Extreme caution should be exercised at all stages of data collection and proof-reading processes. Reliance on one strand alone can easily lead to phantom mutations [2]. The recommendation is thus clear: “Both strands of the amplified product must be

References (43)

  • Y Nishimaki et al.

    Sequence polymorphism in the mtDNA HV1 region in Japanese and Chinese

    Legal Med.

    (1999)
  • A Torroni et al.

    A signal, from human mtDNA, of post-glacial recolonization in Europe

    Am. J. Hum. Genet.

    (2001)
  • C Herrnstadt et al.

    Errors, phantom and otherwise, in human mtDNA sequences

    Am. J. Hum. Genet.

    (2003)
  • H Wittig et al.

    Mitochondrial DNA in the Central European population: human identification with the help of the forensic mtDNA D-loop-base database

    Forensic Sci. Int.

    (2000)
  • H Wittig et al.

    D-Loop-BASE is online now: Central European database of mitochondrial DNA

    Progress in Forensic Genetics

    (2003)
  • H.-J Bandelt et al.

    Detecting errors in mtDNA data by phylogenetic analysis

    Int. J. Legal Med.

    (2001)
  • P Forster

    To err is human

    Ann. Hum. Genet.

    (2003)
  • H.-J. Bandelt, C. Herrnstadt, Y.-G. Yao, Q.-P. Kong, T. Kivisild, C. Rengo, R. Scozzari, M. Richards, R. Villems, V....
  • Q.-P Kong et al.

    Mitochondrial DNA sequence polymorphisms of five ethnic populations from northern China

    Hum. Genet.

    (2003)
  • Y.-G Yao et al.

    Pitfalls in the analysis of ancient human mtDNA

    Chinese Sci. Bull.

    (2003)
  • L Rao et al.

    Sequence polymorphisms of the mitochondrial DNA control region in 105 Chinese Han population

    J. Forensic Sci.

    (2003)
  • Cited by (0)

    1

    Present address: The Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, MD, USA.

    View full text