Abstract

Expression quantitative trait locus (eQTL) analysis, which links variations in gene expression to genotypes, is essential to understanding gene regulation and to interpreting disease-associated loci. Currently identified eQTLs are mainly in samples of blood and other normal tissues. However, no database comprehensively provides eQTLs in large number of cancer samples. Using the genotype and expression data of 9196 tumor samples in 33 cancer types from The Cancer Genome Atlas (TCGA), we identified 5 606 570 eQTL-gene pairs in the cis-eQTL analysis and 231 210 eQTL-gene pairs in the trans-eQTL analysis. We further performed survival analysis and identified 22 212 eQTLs associated with patient overall survival. Furthermore, we linked the eQTLs to genome-wide association studies (GWAS) data and identified 337 131 eQTLs that overlap with existing GWAS loci. We developed PancanQTL, a user-friendly database (http://bioinfo.life.hust.edu.cn/PancanQTL/), to store cis-eQTLs, trans-eQTLs, survival-associated eQTLs and GWAS-related eQTLs to enable searching, browsing and downloading. PancanQTL could help the research community understand the effects of inherited variants in tumorigenesis and development.

INTRODUCTION

Single nucleotide polymorphisms (SNPs), the most common type of human genetic variation, play important roles in human complex traits and diseases (13). Genome-wide association studies (GWAS) identified more than 10 000 SNPs associated with susceptibility of human traits or diseases (4,5). Most GWAS-detected risk SNPs are located in the genome's non-coding regions (6), indicating that these SNPs mainly exert their functional roles via regulating gene expression. Therefore, understanding SNP regulation of gene expression is essential for interpreting disease related SNPs.

Expression quantitative trait locus (eQTL) analysis, which links variations in gene expression to genotypes, has been demonstrated as a powerful approach to understanding the effects and molecular mechanism of functional SNPs (710). Previous studies identified eQTLs mainly from lymphoblastoid cell lines and normal human tissues (9,1113). For example, the Genotype-Tissue Expression (GTEx) consortium identified eQTLs from 7051 tissue samples of 44 tissues from 449 donors (13). Due to the significance of eQTLs, several databases have been developed to collect eQTLs, including the GTEx Portal (13), ExSNP (14), seeQTL (15) and SCAN (16). However, no database comprehensively provides eQTLs in large number of cancer samples. The majority of eQTLs identified from cancer samples are cancer-specific through a comparison between tumor and normal samples (17). Therefore, it is necessary to analyze eQTLs from large-scale cancer samples to further understand the functional effects of eQTLs in cancer. Furthermore, the majority of studies and databases neglected trans-eQTLs, which are highlighted with significant functions in recent studies (7,18). Collectively, systematic and large-scale investigations of both cis- and trans-eQTLs in multiple cancer types would provide the research community with a further understanding of inherited variant effects in tumorigenesis and development.

The Cancer Genome Atlas (TCGA) generated a large amount of omics data, including RNA sequencing, genotype data and clinical survival information from more than 10 000 cancer samples. These data provide a valuable source for eQTL analysis and further integrative analysis across different cancer types.

DATA COLLECTION AND PROCESSING

Genotype data collection, imputation and processing

To comprehensively identify eQTLs across different cancer types, we obtained genotype data from the TCGA data portal (https://tcga-data.nci.nih.gov/tcga/), which detected the genotypes using Affymetrix SNP 6.0 array containing 898 620 SNPs. To increase the power for eQTL discovery, we imputed autosomal variants for all samples in each cancer type using IMPUTE2 (19), with 1000 Genomes Phase 3 (20) as the reference panel. To improve computation efficiency, we used the two-step procedure of IMPUTE2, which includes pre-phasing, and the imputation of the phased data. After imputation, we used the following criteria to select SNPs (13): (i) imputation confidence score, INFO ≥ 0.4, (ii) minor allele frequency (MAF) ≥ 5%, (iii) SNP missing rate <5% for best-guessed genotypes at posterior probability ≥0.9 and (iv) Hardy–Weinberg Equilibrium P-value > 1 × 10−6 estimated by Hardy–Weinberg R package (21) (Figure 1A).

Figure 1.

Identification of eQTLs in PancanQTL database. (A) Genotyping data collection and processing. (B) Covariates analyzed in eQTL mapping. (C) Gene expression data collection and processing. (D) eQTL analyses of cis-eQTLs, trans-eQTLs, survival-associated eQTLs and GWAS-related eQTLs.

Gene expression data collection and processing

The gene expression profiles were obtained from the TCGA data portal (https://gdc-portal.nci.nih.gov/), which contains 20 531 genes for each sample. In each cancer type, genes with average expression (RSEM calculated by Expectation-Maximization (22)) of ≥1 were retained. To minimize the effects of outliers on the regression scores, the expression values for each gene across all samples were transformed into a standard normal based on rank (13) (Figure 1C).

Covariates

Previous studies showed that factors affecting global gene expression may reduce the eQTL-identifying power (23,24). To remove the global effects on gene expression, covariates are usually included in eQTL analyses (9,13). To remove the effect of population structure on gene expression, we used smartpca in the EIGENSOFT program (25) to perform principal component (PC) analyses for each cancer type, and selected the top five PCs in genotype data as covariates. To remove the hidden batch effects and other confounders in the expression data, we used PEER software (26) to select the first 15 PEER factors from expression data as covariates. To remove the potential effects of clinical status on gene expression, age (9), gender (13) and tumor stage (17) were included as additional covariates (Figure 1B).

Identification of eQTLs

For each cancer type, the genotype data, expression data and covariates were processed to three N (genotype, expression or covariates) × S (samples) matrix files with matched sample order. The gene location (hg19) was downloaded from Genomic Data Commons (https://gdc.cancer.gov/). The SNP location (hg19) was downloaded from dbSNP (https://www.ncbi.nlm.nih.gov/projects/SNP/) (v137). eQTL analysis was performed by Matrix eQTL (27) in linear regression model. SNPs with false discovery rates (FDR) < 0.05 were defined as eQTLs. Cis-eQTLs were defined if the SNP was within 1 Mb from the gene transcriptional start site (TSS) (13), and trans-eQTLs were defined if the SNP was beyond that point (Figure 1D).

Survival-associated eQTLs

Many genes are associated with cancer prognoses (28), and eQTLs may influence the prognosis by altering gene expression. To identify survival-associated eQTLs, we examined the associations between eQTLs and patient overall survival. For each eQTL, samples were classified into three groups: homozygous genotype AA, heterozygous genotype Aa and homozygous genotype aa (A and a represent two alleles of one SNP). The log-rank test was used to examine the differences in survival time, and Kaplan–Meier (KM) curves were plotted to represent the survival time for each group. eQTLs with FDR < 0.05 were defined as survival-associated eQTLs (Figure 1D).

GWAS-related eQTLs

Risk SNPs identified in GWAS studies were downloaded from the GWAS catalog (http://www.ebi.ac.uk/gwas/) (5). GWAS linkage disequilibrium (LD) regions were extracted from SNAP (https://personal.broadinstitute.org/plin/snap/ldsearch.php) (29) with parameters (SNP dataset: 1000 Genomes; r2 (the square of the Pearson correlation coefficient of linkage disequilibrium) threshold: 0.5; population panel: CEU (Utah Residents with Northern and Western European Ancestry); Distance limit: 500 kb). eQTLs that overlap with GWAS tagSNPs and LD SNPs (r2 ≥ 0.5) were identified as GWAS-related eQTLs.

DATABASE CONTENT AND USAGE

Samples in PancanQTL

PancanQTL included 9196 tumor samples from 33 cancer types. The sample size of each cancer type ranged from 36 in cholangiocarcinoma (CHOL) to 1092 in breast invasive carcinoma (BRCA) (Table 1). For the genotype data, we obtained on average 4 480 214 SNPs for each cancer type after imputation and quality control, ranging from 2 765 921 for BRCA to 5 245 402 for acute myeloid leukemia (LAML). After removing lowly expressed genes (RSEM < 1), there were on average 17 814 genes for each cancer type, ranging from 16 758 for uveal melanoma (UVM) to 18 790 for testicular germ cell tumors (TGCT).

Summary of eQTLs for each cancer type in PancanQTL

Table 1.
Summary of eQTLs for each cancer type in PancanQTL
CisTrans
Cancer typeaNo. of samplesNo. of genesNo. of genotypesPairsegeneseQTLsPairsegeneseQTLs
ACC7717, 5623 678 1454610222455898460957
BLCA40818 1714 242 910142 5625573120 374919915753114
BRCA109217 9912 765 921438 47611 859317 93573 124601320 466
CESC30017 9754 367 01795 702416584 4842209674971
CHOL3617 7674 106 2821121150111274436
COAD28617 5004 576 984164 3565048145 46130853732359
DLBC4817 2454 945 36539115391535
ESCA18418 3724 563 67439 358160336 58942556410
GBM15017 6504 660 52259 788190155 85548155465
HNSC51817 9854 302 347267 7976502228 069928510647389
KICH6617 2123 902 7927264320703858261574669
KIRC52717 8124 632 879521 0728739410 72013 97894312 200
KIRP29017 7154 981 141186 3104920164 15927123022516
LAML12317 0995 245 40270 375175864 69658038397
LGG51517 5634 688 205578 6179177437 58021 236180413 084
LIHC36917 8164 218 042151 6135723128 95616 67522303963
LUAD51418 1904 435 432259 4756834220 70961577454513
LUSC50018 2773 787 605204 1456367173 85611 934105010 487
MESO8717 7424 904 16516 52747516 14047443471
OV30118 1373 018 01192 743710074 419619620282245
PAAD17818 0215 099 858113 8102468104 0581221110978
PCPG17817 5524 836 41993 679320383 5171146241985
PRAD49417 6464 887 130691 29910 152514 45715 730110511 589
READ9417 4274 653 09822 78878122 114721472
SARC25818 1834 156 36170 201419461 193570410554115
SKCM10317 6454 968 33615 04672014 48734845299
STAD41518 4784 362 659161 2714913142 70924703911994
TGCT15018 7904 927 19771 832195967 88265339599
THCA50317 2774 936 390927 67810 766659 32313 5927458908
THYM12017 7855 036 99285 627209078 50743643379
UCEC17618 1955 111 00225 426118824 72125135248
UCS5618 3144 036 51848825488626
UVM8016 7584 812 28326 23389025 260545
CisTrans
Cancer typeaNo. of samplesNo. of genesNo. of genotypesPairsegeneseQTLsPairsegeneseQTLs
ACC7717, 5623 678 1454610222455898460957
BLCA40818 1714 242 910142 5625573120 374919915753114
BRCA109217 9912 765 921438 47611 859317 93573 124601320 466
CESC30017 9754 367 01795 702416584 4842209674971
CHOL3617 7674 106 2821121150111274436
COAD28617 5004 576 984164 3565048145 46130853732359
DLBC4817 2454 945 36539115391535
ESCA18418 3724 563 67439 358160336 58942556410
GBM15017 6504 660 52259 788190155 85548155465
HNSC51817 9854 302 347267 7976502228 069928510647389
KICH6617 2123 902 7927264320703858261574669
KIRC52717 8124 632 879521 0728739410 72013 97894312 200
KIRP29017 7154 981 141186 3104920164 15927123022516
LAML12317 0995 245 40270 375175864 69658038397
LGG51517 5634 688 205578 6179177437 58021 236180413 084
LIHC36917 8164 218 042151 6135723128 95616 67522303963
LUAD51418 1904 435 432259 4756834220 70961577454513
LUSC50018 2773 787 605204 1456367173 85611 934105010 487
MESO8717 7424 904 16516 52747516 14047443471
OV30118 1373 018 01192 743710074 419619620282245
PAAD17818 0215 099 858113 8102468104 0581221110978
PCPG17817 5524 836 41993 679320383 5171146241985
PRAD49417 6464 887 130691 29910 152514 45715 730110511 589
READ9417 4274 653 09822 78878122 114721472
SARC25818 1834 156 36170 201419461 193570410554115
SKCM10317 6454 968 33615 04672014 48734845299
STAD41518 4784 362 659161 2714913142 70924703911994
TGCT15018 7904 927 19771 832195967 88265339599
THCA50317 2774 936 390927 67810 766659 32313 5927458908
THYM12017 7855 036 99285 627209078 50743643379
UCEC17618 1955 111 00225 426118824 72125135248
UCS5618 3144 036 51848825488626
UVM8016 7584 812 28326 23389025 260545

aThe full names of cancer types are shown in Supplementary Table S1.

Table 1.
Summary of eQTLs for each cancer type in PancanQTL
CisTrans
Cancer typeaNo. of samplesNo. of genesNo. of genotypesPairsegeneseQTLsPairsegeneseQTLs
ACC7717, 5623 678 1454610222455898460957
BLCA40818 1714 242 910142 5625573120 374919915753114
BRCA109217 9912 765 921438 47611 859317 93573 124601320 466
CESC30017 9754 367 01795 702416584 4842209674971
CHOL3617 7674 106 2821121150111274436
COAD28617 5004 576 984164 3565048145 46130853732359
DLBC4817 2454 945 36539115391535
ESCA18418 3724 563 67439 358160336 58942556410
GBM15017 6504 660 52259 788190155 85548155465
HNSC51817 9854 302 347267 7976502228 069928510647389
KICH6617 2123 902 7927264320703858261574669
KIRC52717 8124 632 879521 0728739410 72013 97894312 200
KIRP29017 7154 981 141186 3104920164 15927123022516
LAML12317 0995 245 40270 375175864 69658038397
LGG51517 5634 688 205578 6179177437 58021 236180413 084
LIHC36917 8164 218 042151 6135723128 95616 67522303963
LUAD51418 1904 435 432259 4756834220 70961577454513
LUSC50018 2773 787 605204 1456367173 85611 934105010 487
MESO8717 7424 904 16516 52747516 14047443471
OV30118 1373 018 01192 743710074 419619620282245
PAAD17818 0215 099 858113 8102468104 0581221110978
PCPG17817 5524 836 41993 679320383 5171146241985
PRAD49417 6464 887 130691 29910 152514 45715 730110511 589
READ9417 4274 653 09822 78878122 114721472
SARC25818 1834 156 36170 201419461 193570410554115
SKCM10317 6454 968 33615 04672014 48734845299
STAD41518 4784 362 659161 2714913142 70924703911994
TGCT15018 7904 927 19771 832195967 88265339599
THCA50317 2774 936 390927 67810 766659 32313 5927458908
THYM12017 7855 036 99285 627209078 50743643379
UCEC17618 1955 111 00225 426118824 72125135248
UCS5618 3144 036 51848825488626
UVM8016 7584 812 28326 23389025 260545
CisTrans
Cancer typeaNo. of samplesNo. of genesNo. of genotypesPairsegeneseQTLsPairsegeneseQTLs
ACC7717, 5623 678 1454610222455898460957
BLCA40818 1714 242 910142 5625573120 374919915753114
BRCA109217 9912 765 921438 47611 859317 93573 124601320 466
CESC30017 9754 367 01795 702416584 4842209674971
CHOL3617 7674 106 2821121150111274436
COAD28617 5004 576 984164 3565048145 46130853732359
DLBC4817 2454 945 36539115391535
ESCA18418 3724 563 67439 358160336 58942556410
GBM15017 6504 660 52259 788190155 85548155465
HNSC51817 9854 302 347267 7976502228 069928510647389
KICH6617 2123 902 7927264320703858261574669
KIRC52717 8124 632 879521 0728739410 72013 97894312 200
KIRP29017 7154 981 141186 3104920164 15927123022516
LAML12317 0995 245 40270 375175864 69658038397
LGG51517 5634 688 205578 6179177437 58021 236180413 084
LIHC36917 8164 218 042151 6135723128 95616 67522303963
LUAD51418 1904 435 432259 4756834220 70961577454513
LUSC50018 2773 787 605204 1456367173 85611 934105010 487
MESO8717 7424 904 16516 52747516 14047443471
OV30118 1373 018 01192 743710074 419619620282245
PAAD17818 0215 099 858113 8102468104 0581221110978
PCPG17817 5524 836 41993 679320383 5171146241985
PRAD49417 6464 887 130691 29910 152514 45715 730110511 589
READ9417 4274 653 09822 78878122 114721472
SARC25818 1834 156 36170 201419461 193570410554115
SKCM10317 6454 968 33615 04672014 48734845299
STAD41518 4784 362 659161 2714913142 70924703911994
TGCT15018 7904 927 19771 832195967 88265339599
THCA50317 2774 936 390927 67810 766659 32313 5927458908
THYM12017 7855 036 99285 627209078 50743643379
UCEC17618 1955 111 00225 426118824 72125135248
UCS5618 3144 036 51848825488626
UVM8016 7584 812 28326 23389025 260545

aThe full names of cancer types are shown in Supplementary Table S1.

eQTLs in PancanQTL

For each cancer type, the average associations of ∼81 billion SNP-gene pairs were tested for cis- and trans-eQTL mapping. In cis-eQTL analysis, we identified 5 606 570 eQTL-gene pairs in 33 cancer types at a per-tissue FDR < 0.05, which corresponded to a median P-value < 9.22 × 10−5 (Supplementary Table S1). There were 11 cis-eQTLs identified in CHOL, while 659 323 cis-eQTLs were identified in thyroid carcinoma (THCA). The number of cis-eQTLs was significantly correlated with the number of samples (Spearman correlation Rs = 0.93, P-value = 2.97 × 10−15). The number of cis-eQTL regulated genes (egenes) ranged from two in CHOL to 11 859 in BRCA (Table 1). For trans-eQTL analysis, we identified 231 210 eQTL-gene pairs in 33 cancer types at a per-tissue FDR < 0.05, which corresponded to a median P-value < 1.54 × 10−9 (Supplementary Table S1). The number of trans-eQTLs ranged from five in lymphoid neoplasm diffuse large B-cell lymphoma (DLBC) and uterine carcinosarcoma (UCS) to 20 466 in BRCA, while the number of egenes ranged from two in UCS to 6013 in BRCA (Table 1). The number of trans-QTLs is also significantly correlated with the number of samples (Rs = 0.74, P-value = 6.84 × 10−7).

Among the cis- and trans-eQTLs, we identified 22 212 eQTLs associated with patient overall survival in the different cancer types at FDR < 0.05. The number of survival-associated eQTLs ranged from one in UCS to 4330 in THCA. To identify GWAS-related eQTLs, we extracted 28 345 trait/disease-related SNPs from the GWAS catalog and obtained 1 167 961 SNPs located in GWAS LD regions. Among these, 337 131 SNPs are eQTLs in at least one cancer type.

Web design and interface

Results were organized into a set of relational MySQL tables (30), with the website constructed using HTML and PHP. We designed four modules to display cis-eQTLs, trans-eQTLs, survival-associated eQTLs and GWAS-related eQTLs (Figure 2A). Users could browse each eQTL module simply by clicking the corresponding module. On the home page, we designed an advanced search box for a comprehensive query across four modules (Figure 2B). For example, the user can select a cancer type (e.g. STAD) and input an SNP ID (e.g. rs2351010), gene symbol (e.g. ERAP2) or genomic region (e.g. chr1:1–1000000) to search eQTLs in four modules. A quick search option is available on each page (top right) to search by SNP ID, gene symbol or genomic region. Users can download cis-eQTLs and trans-eQTLs for each cancer type from the ‘Download’ page. The ‘Help’ page provides information for data collection and processing. PancanQTL welcomes any feedback by email on the ‘Contact’ page.

Figure 2.

Overview of PancanQTL database. (A) Four modules in PancanQTL, including cis-eQTLs, trans-eQTLs, survival-associated eQTLs and GWAS-related eQTLs. (B) Advanced search box in PancanQTL. (C) Example of an eQTL boxplot in cis-eQTL page. (D) Example of a KM plot in survival-eQTL page.

Data browsing and querying of four modules

Using the homepage browser bar or clicking directly on the ‘cis/trans-eQTLs’ module, users can enter cis/trans-eQTLs page. A table with SNP ID, SNP genomic position, SNP alleles, gene symbol, gene position, beta value (effect size of SNP on gene expression) and eQTL P-value are displayed on the cis/trans-eQTLs page. When the user selects a specific cancer type or enters a gene or SNP ID, the table will be rebuilt to display the query results. For each record of SNP-gene pairs, a vector diagram of boxplot is provided to display the association between SNP genotypes and gene expression. For example, our analysis showed that ERAP2 expression in individuals carrying the homozygote rs2351010 aa is significantly higher than that in individuals carrying the homozygote rs2351010 AA and heterozygous rs2351010 Aa (P-value = 2.37 × 10−302) (Figure 2C).

On the survival-eQTLs page, the SNP information and median overall survival time of each genotype are provided. Search boxes are designed for retrieving specific cancer types and SNPs. For each SNP, a vector diagram of KM plot is provided to display the association between SNP genotypes and overall survival. For example, our analysis showed that patients with the rs1824937 aa genotype have worse prognoses than other breast cancer patients (P-value = 6.3 × 10−7) (Figure 2D).

On the GWAS-eQTLs page, the SNP information, regulated gene information and related GWAS traits are displayed. Search boxes are designed for retrieving specific cancer types and SNPs. In addition, users can select a different LD threshold from the dropdown box to prioritize SNPs.

SUMMARY AND FUTURE DIRECTIONS

We systematically identified cis-eQTLs, trans-eQTLs, survival-associated eQTLs and GWAS-related eQTLs in 33 cancer types. We constructed a user-friendly database, PancanQTL, for users to query, browse and download eQTLs. Millions of vector diagrams of eQTL box plots and KM plots are provided. PancanQTL could serve as an important resource for human cancer genetics and provide opportunities to bridge the knowledge gap from variants in sequence to phenotypes. PancanQTL could also contribute to understanding the effects of inherited variants in tumorigenesis and development. Cancer genomics is a rapidly developing field (31), and we expect that the number of cancer samples with genotype and gene expression profiles will increase dramatically. We will update PancanQTL to include more cancer samples and will maintain it as a useful resource for the research community. Previous studies demonstrated the complicated mechanisms for regulating gene expression by eQTLs, including altering RNA sequence, RNA structure, transcription factor binding, miRNA binding, methylation and histone modification (32,33). It will be very interesting to further investigate the regulating mechanisms of eQTLs through integrative analysis if multi-dimensional data are available.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

ACKNOWLEDGEMENTS

The authors gratefully acknowledge Xianchun Tu for helping design and debug the website, and Carol K. Kohn for proofreading of the manuscript. We thank the support from the Cancer Prevention & Research Institute of Texas (CPRIT RR150085).

FUNDING

National Natural Science Foundation of China [81402744 to J.G.]; Cancer Prevention & Research Institute of Texas [RR150085 to L.H.]; UTHealth Innovation for Cancer Prevention Research Training Program Post-doctoral Fellowship (Cancer Prevention and Research Institute of Texas) [RP160015]; China Scholarship Council [201606160058 to C.L., 201606275095 to J. F.]. Funding for open access charge: National Natural Science Foundation of China [81402744].

Conflict of interest statement. None declared.

REFERENCES

1.

Wu
C.
,
Miao
X.
,
Huang
L.
,
Che
X.
,
Jiang
G.
,
Yu
D.
,
Yang
X.
,
Cao
G.
,
Hu
Z.
,
Zhou
Y.
et al. 
Genome-wide association study identifies five loci associated with susceptibility to pancreatic cancer in Chinese populations
.
Nat. Genet.
2011
;
44
:
62
66
.

2.

Visscher
P.M.
,
Wray
N.R.
,
Zhang
Q.
,
Sklar
P.
,
McCarthy
M.I.
,
Brown
M.A.
,
Yang
J.
10 years of GWAS discovery: biology, function, and translation
.
Am. J. Hum. Genet.
2017
;
101
:
5
22
.

3.

Schork
N.J.
,
Fallin
D.
,
Lanchbury
J.S.
Single nucleotide polymorphisms and the future of genetic epidemiology
.
Clin. Genet.
2000
;
58
:
250
264
.

4.

Welter
D.
,
MacArthur
J.
,
Morales
J.
,
Burdett
T.
,
Hall
P.
,
Junkins
H.
,
Klemm
A.
,
Flicek
P.
,
Manolio
T.
,
Hindorff
L.
et al. 
The NHGRI GWAS Catalog, a curated resource of SNP-trait associations
.
Nucleic Acids Res.
2014
;
42
:
D1001
D1006
.

5.

MacArthur
J.
,
Bowler
E.
,
Cerezo
M.
,
Gil
L.
,
Hall
P.
,
Hastings
E.
,
Junkins
H.
,
McMahon
A.
,
Milano
A.
,
Morales
J.
et al. 
The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog)
.
Nucleic Acids Res.
2017
;
45
:
D896
D901
.

6.

Hindorff
L.A.
,
Sethupathy
P.
,
Junkins
H.A.
,
Ramos
E.M.
,
Mehta
J.P.
,
Collins
F.S.
,
Manolio
T.A.
Potential etiologic and functional implications of genome-wide association loci for human diseases and traits
.
Proc. Natl. Acad. Sci. U.S.A.
2009
;
106
:
9362
9367
.

7.

Westra
H.J.
,
Peters
M.J.
,
Esko
T.
,
Yaghootkar
H.
,
Schurmann
C.
,
Kettunen
J.
,
Christiansen
M.W.
,
Fairfax
B.P.
,
Schramm
K.
,
Powell
J.E.
et al. 
Systematic identification of trans eQTLs as putative drivers of known disease associations
.
Nat. Genet.
2013
;
45
:
1238
1243
.

8.

Zhu
Z.
,
Zhang
F.
,
Hu
H.
,
Bakshi
A.
,
Robinson
M.R.
,
Powell
J.E.
,
Montgomery
G.W.
,
Goddard
M.E.
,
Wray
N.R.
,
Visscher
P.M.
et al. 
Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets
.
Nat. Genet.
2016
;
48
:
481
487
.

9.

Grundberg
E.
,
Small
K.S.
,
Hedman
A.K.
,
Nica
A.C.
,
Buil
A.
,
Keildson
S.
,
Bell
J.T.
,
Yang
T.P.
,
Meduri
E.
,
Barrett
A.
et al. 
Mapping cis- and trans-regulatory effects across multiple tissues in twins
.
Nat. Genet.
2012
;
44
:
1084
1089
.

10.

Nica
A.C.
,
Montgomery
S.B.
,
Dimas
A.S.
,
Stranger
B.E.
,
Beazley
C.
,
Barroso
I.
,
Dermitzakis
E.T.
Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations
.
PLoS Genet.
2010
;
6
:
e1000895
.

11.

Lappalainen
T.
,
Sammeth
M.
,
Friedlander
M.R.
,
t Hoen
P.A.
,
Monlong
J.
,
Rivas
M.A.
,
Gonzalez-Porta
M.
,
Kurbatova
N.
,
Griebel
T.
,
Ferreira
P.G.
et al. 
Transcriptome and genome sequencing uncovers functional variation in humans
.
Nature
.
2013
;
501
:
506
511
.

12.

Liang
L.
,
Morar
N.
,
Dixon
A.L.
,
Lathrop
G.M.
,
Abecasis
G.R.
,
Moffatt
M.F.
,
Cookson
W.O.
A cross-platform analysis of 14,177 expression quantitative trait loci derived from lymphoblastoid cell lines
.
Genome Res.
2013
;
23
:
716
726
.

13.

GTEx Consortium
The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans
.
Science
.
2015
;
348
:
648
660
.

14.

Yu
C.H.
,
Pal
L.R.
,
Moult
J.
Consensus genome-wide expression quantitative trait loci and their relationship with human complex trait disease
.
OMICS
.
2016
;
20
:
400
414
.

15.

Xia
K.
,
Shabalin
A.A.
,
Huang
S.
,
Madar
V.
,
Zhou
Y.H.
,
Wang
W.
,
Zou
F.
,
Sun
W.
,
Sullivan
P.F.
,
Wright
F.A.
seeQTL: a searchable database for human eQTLs
.
Bioinformatics
.
2012
;
28
:
451
452
.

16.

Zhang
W.
,
Gamazon
E.R.
,
Zhang
X.
,
Konkashbaev
A.
,
Liu
C.
,
Szilagyi
K.L.
,
Dolan
M.E.
,
Cox
N.J.
SCAN database: facilitating integrative analyses of cytosine modification and expression QTL
.
Database (Oxford)
.
2015
;
2015
:
bav025
.

17.

Ongen
H.
,
Andersen
C.L.
,
Bramsen
J.B.
,
Oster
B.
,
Rasmussen
M.H.
,
Ferreira
P.G.
,
Sandoval
J.
,
Vidal
E.
,
Whiffin
N.
,
Planchon
A.
et al. 
Putative cis-regulatory drivers in colorectal cancer
.
Nature
.
2014
;
512
:
87
90
.

18.

Brynedal
B.
,
Choi
J.
,
Raj
T.
,
Bjornson
R.
,
Stranger
B.E.
,
Neale
B.M.
,
Voight
B.F.
,
Cotsapas
C.
Large-scale trans-eQTLs affect hundreds of transcripts and mediate patterns of transcriptional co-regulation
.
Am. J. Hum. Genet.
2017
;
100
:
581
591
.

19.

Howie
B.N.
,
Donnelly
P.
,
Marchini
J.
A flexible and accurate genotype imputation method for the next generation of genome-wide association studies
.
PLoS Genet.
2009
;
5
:
e1000529
.

20.

Genomes Project
C.
,
Auton
A.
,
Brooks
L.D.
,
Durbin
R.M.
,
Garrison
E.P.
,
Kang
H.M.
,
Korbel
J.O.
,
Marchini
J.L.
,
McCarthy
S.
,
McVean
G.A.
et al. 
A global reference for human genetic variation
.
Nature
.
2015
;
526
:
68
74
.

21.

Graffelman
J.
Exploring diallelic genetic markers: the hardy weinberg package
.
J. Stat. Softw.
2015
;
64
:
1
23
.

22.

Li
B.
,
Dewey
C.N.
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
.
BMC Bioinformatics
.
2011
;
12
:
323
.

23.

Kang
H.M.
,
Ye
C.
,
Eskin
E.
Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots
.
Genetics
.
2008
;
180
:
1909
1925
.

24.

Leek
J.T.
,
Storey
J.D.
Capturing heterogeneity in gene expression studies by surrogate variable analysis
.
PLoS Genet.
2007
;
3
:
1724
1735
.

25.

Price
A.L.
,
Patterson
N.J.
,
Plenge
R.M.
,
Weinblatt
M.E.
,
Shadick
N.A.
,
Reich
D.
Principal components analysis corrects for stratification in genome-wide association studies
.
Nat. Genet.
2006
;
38
:
904
909
.

26.

Stegle
O.
,
Parts
L.
,
Piipari
M.
,
Winn
J.
,
Durbin
R.
Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses
.
Nat. Protoc.
2012
;
7
:
500
507
.

27.

Shabalin
A.A.
Matrix eQTL: ultra fast eQTL analysis via large matrix operations
.
Bioinformatics
.
2012
;
28
:
1353
1358
.

28.

Gentles
A.J.
,
Newman
A.M.
,
Liu
C.L.
,
Bratman
S.V.
,
Feng
W.
,
Kim
D.
,
Nair
V.S.
,
Xu
Y.
,
Khuong
A.
,
Hoang
C.D.
et al. 
The prognostic landscape of genes and infiltrating immune cells across human cancers
.
Nat. Med.
2015
;
21
:
938
945
.

29.

Johnson
A.D.
,
Handsaker
R.E.
,
Pulit
S.L.
,
Nizzari
M.M.
,
O’Donnell
C.J.
,
de Bakker
P.I.
SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap
.
Bioinformatics
.
2008
;
24
:
2938
2939
.

30.

Gong
J.
,
Liu
C.
,
Liu
W.
,
Xiang
Y.
,
Diao
L.
,
Guo
A.Y.
,
Han
L.
LNCediting: a database for functional effects of RNA editing in lncRNAs
.
Nucleic Acids Res.
2017
;
45
:
D79
D84
.

31.

Garraway
L.A.
,
Lander
E.S.
Lessons from the cancer genome
.
Cell
.
2013
;
153
:
17
37
.

32.

Albert
F.W.
,
Kruglyak
L.
The role of regulatory variation in complex traits and disease
.
Nat. Rev. Genet.
2015
;
16
:
197
212
.

33.

Shastry
B.S.
SNPs: impact on gene function and phenotype
.
Methods Mol. Biol.
2009
;
578
:
3
22
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Supplementary data

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.