Article Text

Download PDFPDF

Original article
Impact of DNA source on genetic variant detection from human whole-genome sequencing data
  1. Brett Trost1,
  2. Susan Walker1,
  3. Syed A Haider1,
  4. Wilson W L Sung1,
  5. Sergio Pereira1,
  6. Charly L Phillips1,
  7. Edward J Higginbotham1,2,
  8. Lisa J Strug1,3,
  9. Charlotte Nguyen1,2,
  10. Akshaya Raajkumar1,
  11. Michael J Szego4,5,
  12. Christian R Marshall6,7,
  13. Stephen W Scherer1,2
  1. 1 The Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ontario, Canada
  2. 2 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
  3. 3 Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
  4. 4 Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
  5. 5 Department of Family and Community Medicine, University of Toronto, Toronto, Ontario, Canada
  6. 6 Department of Paediatric Laboratory Medicine, Genome Diagnostics, Hospital for Sick Children, Toronto, Ontario, Canada
  7. 7 Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada
  1. Correspondence to Dr Stephen W Scherer, The Centre for Applied Genomics, Hospital for Sick Children, Toronto, Ontario, Canada; stephen.scherer{at}


Background Whole blood is currently the most common DNA source for whole-genome sequencing (WGS), but for studies requiring non-invasive collection, self-collection, greater sample stability or additional tissue references, saliva or buccal samples may be preferred. However, the relative quality of sequencing data and accuracy of genetic variant detection from blood-derived, saliva-derived and buccal-derived DNA need to be thoroughly investigated.

Methods Matched blood, saliva and buccal samples from four unrelated individuals were used to compare sequencing metrics and variant-detection accuracy among these DNA sources.

Results We observed significant differences among DNA sources for sequencing quality metrics such as percentage of reads aligned and mean read depth (p<0.05). Differences were negligible in the accuracy of detecting short insertions and deletions; however, the false positive rate for single nucleotide variation detection was slightly higher in some saliva and buccal samples. The sensitivity of copy number variant (CNV) detection was up to 25% higher in blood samples, depending on CNV size and type, and appeared to be worse in saliva and buccal samples with high bacterial concentration. We also show that methylation-based enrichment for eukaryotic DNA in saliva and buccal samples increased alignment rates but also reduced read-depth uniformity, hampering CNV detection.

Conclusion For WGS, we recommend using DNA extracted from blood rather than saliva or buccal swabs; if saliva or buccal samples are used, we recommend against using methylation-based eukaryotic DNA enrichment. All data used in this study are available for further open-science investigation.

  • whole-genome sequencing
  • dna source
  • blood
  • saliva
  • buccal

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

View Full Text

Statistics from


  • Contributors BT, SW and SWS designed the study. SW and MJZ managed ethics approval and sample collection. BT, SW, SAH, WWLS, SP, CLP, EJH, LJS, CN, AR and CRM analysed the data. BT and SWS supervised the study and wrote the manuscript. All authors have read and approved the final manuscript.

  • Funding BT is funded by the Canadian Institutes of Health Research (CIHR) Banting Postdoctoral Fellowship. SWS is funded by the GlaxoSmithKline-CIHR Chair in Genome Sciences at the University of Toronto and The Hospital for Sick Children. Technology development funds for this research were used from grants from Genome Canada, the University of Toronto McLaughlin Centre, and The Hospital for Sick Children Foundation. The sequencing and informatics infrastructure was supported by the Canada Foundation for Innovation.

  • Competing interests None declared.

  • Ethics approval This study was approved by the Research Ethics Board (REB) at The Hospital for Sick Children (REB no. 1000053640).

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available in a public, open access repository.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.