WGSA: an annotation pipeline for human genome sequencing studies

Xiaoming Liu; Simon White; Bo Peng; Andrew D Johnson; Jennifer A Brody; Alexander H Li; Zhuoyi Huang; Andrew Carroll; Peng Wei; Richard Gibbs; Robert J Klein; Eric Boerwinkle

doi:10.1136/jmedgenet-2015-103423

Article Text

Methods

Communications

WGSA: an annotation pipeline for human genome sequencing studies

Xiaoming Liu1,2,
Simon White3,
Bo Peng4,
Andrew D Johnson5,6,
Jennifer A Brody7,
Alexander H Li1,
Zhuoyi Huang3,
Andrew Carroll8,
Peng Wei1,9,
Richard Gibbs3,
Robert J Klein10,
Eric Boerwinkle1,2,3

¹Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston, Houston, Texas, USA
²Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, Texas, USA
³Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, USA
⁴Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, Houston, Texas, USA
⁵NHLBI Framingham Heart Study, Bethesda, Maryland, USA
⁶Population Sciences Branch, NHLBI Division of Intramural Research, Bethesda, Maryland, USA
⁷Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, Washington, USA
⁸DNAnexus, Mountain View, California, USA
⁹Department of Biostatistics, School of Public Health, University of Texas Health Science Center at Houston, Houston, Texas, USA
¹⁰Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, Icahn Institute for Genomics and Multiscale Biology, New York, New York, USA

Correspondence to Dr Xiaoming Liu, University of Texas School of Public Health, Human Genetics Center, 1200 Herman Pressler Street, E529, Houston, TX 77030, USA; Xiaoming.Liu{at}uth.tmc.edu

https://doi.org/10.1136/jmedgenet-2015-103423

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

DNA sequencing technologies continue to make progress in increased throughput and quality, and decreased cost. As we transition from whole exome capture sequencing to whole genome sequencing (WGS), our ability to convert machine-generated variant calls, including single nucleotide variant (SNV) and insertion-deletion variants (indels), into human-interpretable knowledge has lagged far behind the ability to obtain enormous amounts of variants. To help narrow this gap, here we present WGSA (WGS annotator), a functional annotation pipeline for human genome sequencing studies, which is runnable out of the box on the Amazon Compute Cloud and is freely downloadable at (https://sites.google.com/site/jpopgen/wgsa/).

Functional annotation is a key step in WGS analysis. In one way, annotation helps the analyst filter to a subset of elements of particular interest (eg, cell type specific enhancers), in another way annotation helps the investigators to increase the power of identifying phenotype-associated loci (eg, association test using functional prediction score as a weight) and interpret potentially interesting findings. Currently, there are several popular gene model based annotation tools, including ANNOVAR,1 SnpEff2 and the Ensembl Variant Effect Predictor (VEP).3 These can annotate a variety of protein coding and non-coding gene models from a range of species. It is well known among practitioners that different databases (eg, RefSeq4 and Ensembl5) use different models for …

View Full Text

Footnotes

Contributors XL, ADJ, JAB, AHL, AC, PW, ZH, RJK and EB designed the study. XL collected the annotation resources and developed the tool. SW tested the pipeline. BP provided tools for retrieving the RegulomeDB data set. EB and RG supervised the study. XL, SW and EB wrote the draft manuscript and all authors provided critical edits.
Funding This study was supported by the US National Institutes of Health (5RC2HL102419 and U54HG003273).
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.

Log in using your username and password

Main menu

Log in using your username and password

You are here

Statistics from Altmetric.com

Request Permissions

Footnotes

Read the full text or download the PDF:

Log in using your username and password

Read the full text or download the PDF:

Log in using your username and password