Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIV Acquisition Meta-Analysis #12

Open
jaamarks opened this issue Jan 30, 2020 · 11 comments
Open

HIV Acquisition Meta-Analysis #12

jaamarks opened this issue Jan 30, 2020 · 11 comments

Comments

@jaamarks
Copy link
Owner

jaamarks commented Jan 30, 2020

Performing meta-analyses HIV Acquisition.

The parent issue is GitHub Issue 97.

Analysis Description

hiv-acquisition-gwas-meta-analysis-description.xlsx

@jaamarks
Copy link
Owner Author

We recently added WIHS3 to the latest iteration of metas. These can be found on S3 at:
s3://rti-hiv/meta_new/{023–027}

@jaamarks
Copy link
Owner Author

WIHS3_AA GWAS (N=2,009)

image image

@jaamarks
Copy link
Owner Author

023: cross-ancestry (N=12,617)

meta015 + WIHS3_AA:

  • UHS1–4_EA (N=3,013)
  • VIDUS_EA (N=931)
  • WIHS1_EA (N=720)
  • UHS1–4_AA (N=4,015)
  • WIHS1_AA (N=2,009)
  • WIHS2_AA (N=844)
  • WIHS3_AA (N=729)
  • WIHS1_HA (N=356)
Previous Current
image image
image image

@jaamarks
Copy link
Owner Author

jaamarks commented Jan 30, 2020

024: AFR (N=7,597)

meta017 + WIHS3_AFR:

  • UHS1–4_AFR (N=4,015)
  • WIHS1_AFR (N=2,009)
  • WIHS2_AFR (N=844)
  • WIHS3_AFR (N=729)
Previous Current
image image
image image

@jaamarks
Copy link
Owner Author

025: cross-ancestry (N=12,261)

meta018 + WIHS3_AA:

  • UHS1–4_EA (N=3,013)
  • VIDUS_EA (N=931)
  • WIHS1_EA (N=720)
  • UHS1–4_AA (N=4,015)
  • WIHS1_AA (N=2,009)
  • WIHS2_AA (N=844)
  • WIHS3_AA (N=729)
Previous Current
image image
image image

@jaamarks
Copy link
Owner Author

026: cross-ancestry (N=26,198)

meta021 + WIHS3_AA:

  • McLaren_EA (N=13,581) (GC applied)
  • UHS1–4_EA (N=3,013)
  • VIDUS_EA (N=931)
  • WIHS1_EA (N=720)
  • UHS1–4_AA (N=4,015)
  • WIHS1_AA (N=2,009)
  • WIHS2_AA (N=844)
  • WIHS3_AA (N=729)
  • WIHS1_HA (N=356)
Previous Current
image image
image image

@jaamarks
Copy link
Owner Author

jaamarks commented Jan 30, 2020

027: cross-ancestry (N=25,842)

meta026 - WIHS1_HA:

  • McLaren_EA (N=13,581) (GC applied)
  • UHS1–4_EA (N=3,013)
  • VIDUS_EA (N=931)
  • WIHS1_EA (N=720)
  • UHS1–4_AA (N=4,015)
  • WIHS1_AA (N=2,009)
  • WIHS2_AA (N=844)
  • WIHS3_AA (N=729)

image

image

@jaamarks
Copy link
Owner Author

jaamarks commented Feb 13, 2020

Rerun using cohort-level data

We reran metas 023–027 except we used the individual cohort-level GWAS summary stats instead of the previous meta-analyses.

023 AFR+AMR+EUR (N=12,617)

~ meta015 + WIHS3_AA
(see s3://rti-hiv/meta_new/015/results/figures/ for previous results)

  • UHS1–4_EA (N=3,013)
  • VIDUS_EA (N=931)
  • WIHS1_EA (N=720)
  • UHS1–4_AA (N=4,015)
  • WIHS1_AA (N=2,009)
  • WIHS2_AA (N=844)
  • WIHS3_AA (N=729)
  • WIHS1_HA (N=356)
image image



024 AFR-specific (N=7,597)

~meta017 + WIHS3_AA
(see s3://rti-hiv/meta_new/017/results/figures/ for previous results)

  • UHS1–4_AA (N=4,015)
  • WIHS1_AA (N=2,009)
  • WIHS2_AA (N=844)
  • WIHS3_AA (N=729)
image image



025 AFR+EUR (N=12,261)

~meta018 + WIHS3_AA
(see s3://rti-hiv/meta_new/018/results/figures for previous results)

  • UHS1–4_EA (N=3,013)
  • VIDUS_EA (N=931)
  • WIHS1_EA (N=720)
  • UHS1–4_AA (N=4,015)
  • WIHS1_AA (N=2,009)
  • WIHS2_AA (N=844)
  • WIHS3_AA (N=729)
image image



026 AFR+AMR+EUR (N=26,198)

~meta021 + WIHS3_AA
(see s3://rti-hiv/meta_new/021/results/v03/figures for previous results)

  • McLaren_EA (N=13,581) (GC applied)
  • UHS1–4_EA (N=3,013)
  • VIDUS_EA (N=931)
  • WIHS1_EA (N=720)
  • UHS1–4_AA (N=4,015)
  • WIHS1_AA (N=2,009)
  • WIHS2_AA (N=844)
  • WIHS3_AA (N=729)
  • WIHS1_HA (N=356)
image image



027 AFR+EUR (N=25,842)

~meta022 + WIHS3_AA
(see s3://rti-hiv/meta_new/022/results/v05/figures/ for previous results)

  • McLaren_EA (N=13,581) (GC applied)
  • UHS1–4_EA (N=3,013)
  • VIDUS_EA (N=931)
  • WIHS1_EA (N=720)
  • UHS1–4_AA (N=4,015)
  • WIHS1_AA (N=2,009)
  • WIHS2_AA (N=844)
  • WIHS3_AA (N=729)
image image


These results have been stored on S3 at:
s3://rti-hiv/meta_new/{023–027}/results/v02/

@jaamarks
Copy link
Owner Author

jaamarks commented Sep 23, 2020

HIV Acquisition Meta-Analysis with TOPMed Imputed Data

0028 AFR+AMR+EUR (N=12,617)

1000g_p3 results (meta 0023)

hiv_acq_cross_1df_meta_023 snps+indels manhattan
hiv_acq_cross_1df_meta_023 snps+indels qq

TOPMed results (meta 0028)
no GC

hiv_acquisition_gwas_meta_cross snps+indels manhattan


hiv_acquisition_gwas_meta_cross snps+indels qq

GC applied to each

hiv_acquisition_gwas_meta_cross snps+indels manhattan


hiv_acquisition_gwas_meta_cross snps+indels qq

@jaamarks
Copy link
Owner Author

jaamarks commented Sep 24, 2020

Troubleshooting

  • Replicate previous 1000g_p3 results
details
  • I reran the previous 1000g_p3 meta-023 and received consistent results. The only caveat is that I noticed that the previous set included WIHS2_EUR, which it shouldn't have. Therefore the results are nearly the same, but not exactly. However, the overall goal was to see if we could replicate the previous results to make sure that our pipeline was not defective. We showed that the pipeline is not defective and therefore we can continue our troubleshooting. To be most consistent, we should run each 1000g_p3 cohort through the new GWAS pipeline.
  • Apply genomic control to all cohorts.
details

This did not tame the inflation.

hiv_acquisition_gwas_meta_cross snps+indels qq hiv_acquisition_gwas_meta_cross snps+indels manhattan

  • Confirm correct GWAS results were being used.
details

I redownloaded each set of TOPMed GWAS results from S3 to double check that we were using the correct results. This checks out.

Also, confirm GWAS results in the HIV bucket were correctly copied from the associated Cromwell output.When each GWAS is complete, the results are copied over from the Cromwell bucket to the appropriate project bucket in the S3 organizational structure. We should verify that the GWAS results in the HIV bucket were copied over correctly. This checks out.


  • Verify meta-analysis pipeline is correct by replicating old results
details

We needed to verify that the meta-analysis pipeline (bash script) we have been using is correct. We attempted to replicate the previous 1000g_p3 HIV acquisition GWAS meta-analysis results. The results were consistent, there for the pipeline is correct.


  • Try stricter thresholds (rsq>0.8 and maf>0.05)
details

During our lab meeting on Wednesday (9/23/2020) we discussed applying a stricter rsq threshold. It was suggested that since we haven't had much experience with TOPMed imputed data, the same filters we apply to 1000g_p3 data might not be directly applicable. We will therefore experiment with applying a stricter rsq threshold. In particular, we will use the GWAS results with rsq > 0.8 applied.

  • rsq 0.3 & MAF 0.01, without GC: lambda=1.243

  • rsq 0.3 & MAF 0.01, with GC: lambda=1.203

  • rsq 0.8 & MAF 0.01, without GC: lambda=1.242

  • rsq 0.8 & MAF 0.01, with GC: lambda=1.204

  • rsq 0.8 & MAF 0.05, without GC: lambda=1.238

  • rsq 0.8 & MAF 0.05, with GC: lambda=1.20


  • Apply genomic control twice
details

We recommend applying genomic control correction to all input files that include genomewide data and, in addition, to the meta-analysis results. (METAL documentation).

hiv_acquisition_gwas_meta_cross snps+indels manhattan
hiv_acquisition_gwas_meta_cross snps+indels qq


  • Inspect gc parameter for each chromosome across each cohort
details

image

chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chr23
UHS1–4 AFR 1.071 1.023 1.048 1.023 0.995 1.002 1.041 1.047 1.043 1.071 1.045 0.974 0.97 1.002 1.033 1.025 1.06 1.054 1.173 1.031 1.012 1.019 1.085
UHS1–4 EUR 1.038 1.108 1.039 1.021 1.094 1.165 1.089 1.04 1.084 1.054 1.09 1.033 1.072 1.056 1.093 1.017 1.011 1.099 1.151 1.186 1.034 1.108 1.02
VIDUS EUR 1.068 0.97 1.028 1.026 0.962 1.027 1.028 1.009 0.963 0.983 1.083 1.133 1.054 1.048 1.126 1.026 1.107 1.001 0.862 1.023 1.14 0.938 0.952
WIHS1 AFR 1.002 0.943 1.032 0.987 0.958 0.988 0.964 1.007 0.981 1.009 0.992 0.975 1.078 0.975 1.078 0.948 0.979 0.933 0.985 0.98 1 0.973 1.006
WIHS1 AMR 1.039 1.065 1.042 1.021 1.026 1.038 1.014 0.994 1.005 0.986 1.049 1.029 1.046 1.015 0.989 1.078 1.073 1.024 0.981 0.998 0.976 1.112 1.201
WIHS1 EUR 0.975 1.047 1.042 1.081 0.963 1.03 1.071 0.95 0.986 0.974 0.989 1.014 1.127 0.962 1.032 0.973 1.039 0.999 0.943 1.029 0.881 1.036 1
WIHS2 AFR 1.017 1.033 1.031 1.001 1.045 0.965 0.99 1.031 0.977 1.038 1.028 1.072 1.116 1.041 1.006 0.983 0.978 1.026 0.985 0.978 0.924 1.022 1.115
WIHS3 AFR 1.012 1.05 0.981 0.996 1.02 0.97 0.969 1.024 0.969 1.032 1.066 1.04 1.116 1.001 0.979 0.98 0.978 1.007 0.999 1.005 0.983 0.999 1.117



After GC was applied to each cohort/chromosome.
image

chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chr23
GC 1.193 1.215 1.22 1.224 1.187 1.171 1.198 1.194 1.176 1.167 1.131 1.204 1.183 1.193 1.232 1.189 1.234 1.19 1.329 1.21 1.16 1.214 1.272

  • QQ plot for each chromosome
details
No GC

image


chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chr23
lambda 1.229 1.259 1.255 1.238 1.217 1.198 1.227 1.229 1.198 1.211 1.18 1.249 1.277 1.216 1.278 1.201 1.269 1.225 1.392 1.248 1.182 1.243 1.373

qq plots

chr1_gwas_plot snps+indels qq
chr2_gwas_plot snps+indels qq
chr3_gwas_plot snps+indels qq
chr4_gwas_plot snps+indels qq
chr5_gwas_plot snps+indels qq
chr6_gwas_plot snps+indels qq
chr7_gwas_plot snps+indels qq
chr8_gwas_plot snps+indels qq
chr9_gwas_plot snps+indels qq
chr10_gwas_plot snps+indels qq
chr11_gwas_plot snps+indels qq
chr12_gwas_plot snps+indels qq
chr13_gwas_plot snps+indels qq
chr14_gwas_plot snps+indels qq
chr15_gwas_plot snps+indels qq
chr16_gwas_plot snps+indels qq
chr17_gwas_plot snps+indels qq
chr18_gwas_plot snps+indels qq
chr19_gwas_plot snps+indels qq
chr20_gwas_plot snps+indels qq
chr21_gwas_plot snps+indels qq
chr22_gwas_plot snps+indels qq
chr23_gwas_plot snps+indels qq


GC applied to each cohort

image


chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chr23
lambda 1.194 1.214 1.22 1.223 1.187 1.171 1.198 1.195 1.176 1.167 1.131 1.204 1.183 1.194 1.232 1.189 1.235 1.189 1.329 1.21 1.16 1.214 1.272

qq plots

chr1_gwas_plot snps+indels qq
chr2_gwas_plot snps+indels qq
chr3_gwas_plot snps+indels qq
chr4_gwas_plot snps+indels qq
chr5_gwas_plot snps+indels qq
chr6_gwas_plot snps+indels qq
chr7_gwas_plot snps+indels qq
chr8_gwas_plot snps+indels qq
chr9_gwas_plot snps+indels qq
chr10_gwas_plot snps+indels qq
chr11_gwas_plot snps+indels qq
chr12_gwas_plot snps+indels qq
chr13_gwas_plot snps+indels qq
chr14_gwas_plot snps+indels qq
chr15_gwas_plot snps+indels qq
chr16_gwas_plot snps+indels qq
chr17_gwas_plot snps+indels qq
chr18_gwas_plot snps+indels qq
chr19_gwas_plot snps+indels qq
chr20_gwas_plot snps+indels qq
chr21_gwas_plot snps+indels qq
chr22_gwas_plot snps+indels qq
chr23_gwas_plot snps+indels qq


GC applied twice

GC applied to each cohort and then again to the meta-analysis results.

qq plots

chr1_gwas_plot snps+indels qq
chr2_gwas_plot snps+indels qq
chr3_gwas_plot snps+indels qq
chr4_gwas_plot snps+indels qq
chr5_gwas_plot snps+indels qq
chr6_gwas_plot snps+indels qq
chr7_gwas_plot snps+indels qq
chr8_gwas_plot snps+indels qq
chr9_gwas_plot snps+indels qq
chr10_gwas_plot snps+indels qq
chr11_gwas_plot snps+indels qq
chr12_gwas_plot snps+indels qq
chr13_gwas_plot snps+indels qq
chr14_gwas_plot snps+indels qq
chr15_gwas_plot snps+indels qq
chr16_gwas_plot snps+indels qq
chr17_gwas_plot snps+indels qq
chr18_gwas_plot snps+indels qq
chr19_gwas_plot snps+indels qq
chr20_gwas_plot snps+indels qq
chr21_gwas_plot snps+indels qq
chr22_gwas_plot snps+indels qq
chr23_gwas_plot snps+indels qq



  • Perform meta on subset of cohorts
details

Investigate whether any one cohort is disproportionately contributing to the inflation.

  • Leave-one-out approach that @danahancock suggested during our lab meeting.

image


uhs_afr uhs_eur vidus_eur wihs1_afr wihs1_amr wihs1_eur wihs2_afr wihs3_afr
lambda 1.32 1.244 1.25 1.235 1.297 1.239 1.053 1.056

UHS_AFR gwas plots

Leave UHS1–4 AFR out.

hiv_acquisition_gwas_meta_cross snps+indels manhattan
hiv_acquisition_gwas_meta_cross snps+indels qq

UHS_EUR qq-plot

Leave UHS1–4 EUR out.

hiv_acquisition_gwas_meta_cross snps+indels manhattan
hiv_acquisition_gwas_meta_cross snps+indels qq

VIDUS_EUR qq-plot

Leave VIDUS EUR out.

hiv_acquisition_gwas_meta_cross snps+indels manhattan
hiv_acquisition_gwas_meta_cross snps+indels qq

WIHS1_AFR qq-plot

Leave WIHS1 AFR out.

hiv_acquisition_gwas_meta_cross snps+indels manhattan
hiv_acquisition_gwas_meta_cross snps+indels qq

WIHS1_AMR qq-plot

Leave WIHS1 AMR out.

hiv_acquisition_gwas_meta_cross snps+indels manhattan
hiv_acquisition_gwas_meta_cross snps+indels qq

WIHS1_EUR qq-plot

Leave WIHS1 EUR out.

hiv_acquisition_gwas_meta_cross snps+indels manhattan
hiv_acquisition_gwas_meta_cross snps+indels qq

WIHS2_AFR qq-plot

Leave WIHS2 AFR out.

hiv_acquisition_gwas_meta_cross snps+indels manhattan
hiv_acquisition_gwas_meta_cross snps+indels qq

WIHS3_AFR qq-plot

Leave WIHS3 AFR out.

hiv_acquisition_gwas_meta_cross snps+indels manhattan
hiv_acquisition_gwas_meta_cross snps+indels qq


  • WIHS2&3 results check
details

Recreate QQ and Manhattan plots to see if we can replicate what came out of the workflow. This checks out.

workflow meta
WIHS2 image hiv_acquisition_gwas_meta_afr snps+indels qq
WIHS3 image hiv_acquisition_gwas_meta_afr snps+indels qq

@jaamarks
Copy link
Owner Author

jaamarks commented Oct 14, 2020

TL;DR

(1) For the 1000g_p3 GWAS, all the WIHS3 subjects were also included within WIHS2.

(2) Including WIHS3 in the 1000g_p3 meta did inflate the results some, but it was not as pronounced as this current TOPMed meta.

(3) ProbABEL was used to run the 1000g_p3 GWAS for both WIHS2 and WIHS3 (as opposed to RVTEST which we used for the TOPMed GWAS). The individual GWAS results were deflated for both when using ProbABEL. This could explain why adding WIHS3 to the 1000g_p3 meta did not inflate the results as much as it did in the TOPMed meta. The inflation is most evident in the AFR-specific results (go figure).



All of WIHS3 was in WIHS2 for 1000g_p3

When comparing the phenotype files for WIHS2 and WIHS3 from the original 1000g_p3 results, we see that every subject in WIHS3 was within WIHS2.

  • WIHS3 1000g_p3 phenotype: s3://rti-hiv/gwas/wihs3/data/acquisition/0001/archive/phenotype/0001/final/wihs3_aa_hiv_age_sex_PC3+PC8+PC2.txt
  • WIHS2 1000g_p3 phenotype: s3://rti-hiv/gwas/wihs2/data/acquisition/archive/pheno/WIHS2 HIV status GWAS baseline.csv
command line
aws s3 cp s3://rti-hiv/gwas/wihs3/data/acquisition/0001/archive/phenotype/0001/final/wihs3_aa_hiv_age_sex_PC3+PC8+PC2.txt .
aws s3 cp s3://rti-hiv/gwas/wihs2/data/acquisition/archive/pheno/WIHS2 HIV status GWAS baseline.csv .

cut -d" " -f1 wihs3_aa_hiv_age_sex_PC3+PC8+PC2.txt   | tail -n +2 |\
  awk -F"_" '{print $2}' | xargs -I{} grep {} WIHS2\ HIV\ status\ GWAS\ baseline.csv


Inflation observed when adding WIHS3 to 1000g_p3 meta

We do observed some inflation when adding WIHS3 to the 1000g_p3 results. The lambda values do not increase as markedly as with the TOPMed imputed results, but one can observed, from visual inspection, a clear shift above the line y=x. The most appreciable inflation is observed within the AFR-specific meta-analysis. Click below to expand the 1000g_p3 vs TOPMed plots for each meta.

It should also be noted that the TOPMed imputed data has more coverage. In particular, when comparing the final SNP count of the cross-ancestry meta-analysis (AFR+AMR+EUR) we see that the TOPMed results have over a million more observations.

  • 1000g_p3: N SNPs = 16,841,575
  • TOPMed: N SNPs = 18,033,320

AFR+AMR+EUR
without WIHS3 with WIHS3
manhattan hiv_acquisition_gwas_meta_cross snps+indels manhattan hiv_acquisition_gwas_meta_cross snps+indels manhattan
qq hiv_acquisition_gwas_meta_cross snps+indels qq hiv_acquisition_gwas_meta_cross snps+indels qq

AFR+EUR
without WIHS3 with WIHS3
manhattan hiv_acquisition_gwas_meta_cross snps+indels manhattan hiv_acquisition_gwas_meta_cross snps+indels manhattan
qq hiv_acquisition_gwas_meta_cross snps+indels qq hiv_acquisition_gwas_meta_cross snps+indels qq

AFR-specific
without WIHS3 with WIHS3
manhattan hiv_acquisition_gwas_meta_cross snps+indels manhattan hiv_acquisition_gwas_meta_cross snps+indels manhattan
qq hiv_acquisition_gwas_meta_cross snps+indels qq hiv_acquisition_gwas_meta_cross snps+indels qq



ProbABEL GWAS results were deflated

WIHS2 and WIHS3 1000g_p3 were both ran using the ProbABEL GWAS software. This is apparent from looking at the summary stats headers and also the file names (e.g. palogist in the name which is a ProbABEL term for logistic regression). The lambda values for WIHS2 and WIHS3 were 1.017 and 1.01, respectively. Though these values do not raise any red flags, we do observe in both the Manhattan plots and the QQ plots below (click to expand) there are some striking similarities. For example, it is evident that the points in the QQ plots for both WIHS2 and WIHS3 are systematically shifted below the line y=x. This characteristic of being deflated could explain why the 1000g_p3 metas were not as markedly inflated as we observed within the TOPMed meta-analysis results.

1000g_p3 ProbABEL GWAS plots

plot locations:

  • WIHS3 (N=729)
    s3://rti-hiv/gwas/wihs3/results/acquisition/0002/afr/archive/1df/aa/final/wihs3.aa.1df.1000G.hiv_acq.maf_gt_0.01.rsq_gt_0.3.assoc.plot.all_chr.snps+indels.*.png
  • WIHS2 (N=844) s3://rti-hiv/gwas/wihs2/results/archive/wihs2.aa.palogist.status~snp+ageatbl+EV.plots.snps+indels.*.png.gz

QQ Manhattan
WIHS2 wihs2 aa palogist status~snp+ageatbl+EV plots snps+indels qq

wihs2 aa palogist status~snp+ageatbl+EV plots snps+indels manhattan
WIHS3 wihs3 aa 1df 1000G hiv_acq maf_gt_0 01 rsq_gt_0 3 assoc plot all_chr snps+indels qq wihs3 aa 1df 1000G hiv_acq maf_gt_0 01 rsq_gt_0 3 assoc plot all_chr snps+indels manhattan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant