Article Text

Download PDFPDF
WGSA: an annotation pipeline for human genome sequencing studies
  1. Xiaoming Liu1,2,
  2. Simon White3,
  3. Bo Peng4,
  4. Andrew D Johnson5,6,
  5. Jennifer A Brody7,
  6. Alexander H Li1,
  7. Zhuoyi Huang3,
  8. Andrew Carroll8,
  9. Peng Wei1,9,
  10. Richard Gibbs3,
  11. Robert J Klein10,
  12. Eric Boerwinkle1,2,3
  1. 1Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston, Houston, Texas, USA
  2. 2Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, Texas, USA
  3. 3Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
  4. 4Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, Houston, Texas, USA
  5. 5NHLBI Framingham Heart Study, Bethesda, Maryland, USA
  6. 6Population Sciences Branch, NHLBI Division of Intramural Research, Bethesda, Maryland, USA
  7. 7Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, Washington, USA
  8. 8DNAnexus, Mountain View, California, USA
  9. 9Department of Biostatistics, School of Public Health, University of Texas Health Science Center at Houston, Houston, Texas, USA
  10. 10Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, Icahn Institute for Genomics and Multiscale Biology, New York, New York, USA
  1. Correspondence to Dr Xiaoming Liu, University of Texas School of Public Health, Human Genetics Center, 1200 Herman Pressler Street, E529, Houston, TX 77030, USA; Xiaoming.Liu{at}

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

DNA sequencing technologies continue to make progress in increased throughput and quality, and decreased cost. As we transition from whole exome capture sequencing to whole genome sequencing (WGS), our ability to convert machine-generated variant calls, including single nucleotide variant (SNV) and insertion-deletion variants (indels), into human-interpretable knowledge has lagged far behind the ability to obtain enormous amounts of variants. To help narrow this gap, here we present WGSA (WGS annotator), a functional annotation pipeline for human genome sequencing studies, which is runnable out of the box on the Amazon Compute Cloud and is freely downloadable at (

Functional annotation is a key step in WGS analysis. In one way, annotation helps the analyst filter to a subset of elements of particular interest (eg, cell type specific enhancers), in another way annotation helps the investigators to increase the power of identifying phenotype-associated loci (eg, association test using functional prediction score as a weight) and interpret potentially interesting findings. Currently, there are several popular gene model based annotation tools, including ANNOVAR,1 SnpEff2 and the Ensembl Variant Effect Predictor (VEP).3 These can annotate a variety of protein coding and non-coding gene models from a range of species. It is well known among practitioners that different databases (eg, RefSeq4 and Ensembl5) use different models for …

View Full Text


  • Contributors XL, ADJ, JAB, AHL, AC, PW, ZH, RJK and EB designed the study. XL collected the annotation resources and developed the tool. SW tested the pipeline. BP provided tools for retrieving the RegulomeDB data set. EB and RG supervised the study. XL, SW and EB wrote the draft manuscript and all authors provided critical edits.

  • Funding This study was supported by the US National Institutes of Health (5RC2HL102419 and U54HG003273).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.