Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jesse: CFAR + COGA QC & GWAS #8

Open
jaamarks opened this issue Oct 1, 2019 · 13 comments
Open

Jesse: CFAR + COGA QC & GWAS #8

jaamarks opened this issue Oct 1, 2019 · 13 comments
Labels

Comments

@jaamarks
Copy link
Owner

jaamarks commented Oct 1, 2019

See parent GitHub Issue 133.

CFAR dbGaP
COGA dbGaP

QC these dbGaP studies and combine for GWAS and eventual inclusion in the HIV acquisition meta-analysis.

Age Distributions

Age distributions for CFAR and COGA

CFAR COGA
image image
CFAR dbGaP Ternary Plot

image

COGA dbGaP Ternary Plot

image

@jaamarks
Copy link
Owner Author

jaamarks commented Oct 1, 2019

Ternary Plots

CFAR unfiltered (N=4,761)

afr_eas_eur_CFAR

CFAR: HAPMAP as reference Panel

I also performed the Structure analysis on the CFAR subjects using a more homogeneous reference panel. In particular, I used a subset of the three superpopulations—AFR, EUR, and EAS—that we generally use. The subset populations I used as references were YRI for AFR, CHB for EAS, and CEU for EUR. Here is the triangle plot for that analysis.

image

CFAR: include 1000G subjects in plot

I have included the 1,668 1KG subjects in the triangle plot below. The tight grouping at the vertices comes to no surprise however, since in Structure these were specified as the reference populations.

image


COGA unfiltered (N=5,415)

afr_eas_eur_COGA

CFAR+COGA EA

Here are the merged CFAR+COGA samples on a triangle plot generated from the STRUCTURE analysis. We took a 10K random sample (as per usual) from the post-QC genotype data. The cases (CFAR) are blue and the controls (COGA) are red. Both plots are the same except that the controls were plotted first on the top plot and the controls were plotted second on the bottom plot.

afr_eas_eur_coga_highlighted_COGA_RED
afr_eas_eur_cfar_highlighted_CFAR_BLUE

@jaamarks jaamarks changed the title CFAR - QC Jesse: CFAR + COGA QC & GWAS Oct 1, 2019
@jaamarks jaamarks added the HIV label Oct 1, 2019
@jaamarks
Copy link
Owner Author

jaamarks commented Oct 7, 2019

We were a bit concerned that the CFAR subjects were shifted to the left. After double checking everything and reperforming the Structure analysis, it appears that this is correct though. We used the same code to process the COGA subjects and also the WIHS3 subjects and those data appear fine, so it must just be something inherent with the CFAR sample.

@jaamarks
Copy link
Owner Author

jaamarks commented Dec 11, 2019

GWAS Results

Genotype PC Plots

image

image

GWAS Plots: with top 3 PCs
cfar_cogo ea 1df 1000G hiv_acq maf_gt_0 01 rsq_gt_0 3 assoc plot all_chr snps+indels qq
cfar_cogo ea 1df 1000G hiv_acq maf_gt_0 01 rsq_gt_0 3 assoc plot all_chr snps+indels manhattan
GWAS Plots: with top10 PCs
cfar_cogo ea 1df 1000G hiv_acq maf_gt_0 01 rsq_gt_0 3 assoc plot all_chr snps+indels qq
cfar_cogo ea 1df 1000G hiv_acq maf_gt_0 01 rsq_gt_0 3 assoc plot all_chr snps+indels manhattan
Genotyped SNPs only: RSQ (0.30) and MAF (0.01) filters

Only the genotyped SNPs that also passed the RSQ (0.30) and MAF (0.01) filters.

cfar_coga ea 1df 1000G hiv_acq maf_gt_0 01 rsq_gt_0 3 assoc plot all_chr snps+indels manhattan
cfar_coga ea 1df 1000G hiv_acq maf_gt_0 01 rsq_gt_0 3 assoc plot all_chr snps+indels qq

Genotyped SNPs only: RSQ (0.90) and MAF (0.01) filters
Only the genotyped SNPs that also passed the RSQ (0.90) and MAF (0.01) filters.

cfar_coga ea 1df 1000G hiv_acq maf_gt_0 01 rsq_gt_0 9 assoc plot all_chr snps+indels manhattan
cfar_coga ea 1df 1000G hiv_acq maf_gt_0 01 rsq_gt_0 9 assoc plot all_chr snps+indels qq



Top SNPs (p<0.001) from the original GWAS model:
HIV_ACQ~SNP+Sex+Alc_Dep+PC1+PC3+PC7
with the expected & observed heterozygote frequencies appended.

cfar_cogo.ea.1df.1000G_p3.HIV_ACQ~SNP+AGE+SEX+ALCOHOL+PC1+PC3+PC7.maf_gt_0.01_subject+eur.rsq_gt_0.30.p_lte_0.001.hwe_merged.xlsx

@jaamarks
Copy link
Owner Author

The results from three different Structure analyses.

  • The Post-dbGaP analysis was performed on the CFAR data with no processing (other than converting affymetrix ids to rsids)
  • Post-QC data was performed with the genotype CFAR genotype data that had passed QC
  • The Imputed SNPs were the set that actually got imputed. This set had 36,503 SNPs removed after applying the processing steps:
    • remove 1000G discordant alleles
    • remove monomorphic variants
    • SNP intersection with COGA
Post-dbGaP afr_eas_eur_filtered_cfar_EA
Post-QC afr_eas_eur_CFAR
Imputed SNPs afr_eas_eur_CFAR

@jaamarks
Copy link
Owner Author

image

@jaamarks jaamarks mentioned this issue Mar 19, 2020
@jaamarks
Copy link
Owner Author

jaamarks commented Mar 19, 2020

Revisit: QC CFAR+COGA together

@jaamarks
Copy link
Owner Author

jaamarks commented Mar 19, 2020

QC COGA+CFAR combined

We are going to combine the CFAR and COGA genotype data and QC them together. Here are the results from the STRUCTURE analysis thus far.

Action Description Thresholding Criteria
For EA retainment (AFR < 25%)∧(EAS < 25%)
For AA retainment (AFR > 25%)∧(EAS < 25%)
For HA retainment (AFR < 25%)∧(EAS > 25%)

Ternary Plots

CFAR subjects are blue.

image

EUR Ancestry (N=7,461)

image

AFR Ancestry (N=2,065)

image

AMR Ancestry (N=642)

image




CFAR_COGA Genotype QC Summary Stats

Below are the genotype summary stats for the merged CFAR & COGA data.

Initial Summary Stats
QC procedure Variants removed Variants retained Subjects removed Subjects retained
Initial CFAR dbGaP dataset - 581,817 - 4,761
Initial COGA dbGaP dataset - 581,036 - 5,415
Merge data - 611,069 - 10,176
Convert Markernames to rsid 14,511 596,558 - -
Duplicate rsID filtering 24,062 572,496 - -
Genome build 37 and dbGaP 138 update 277 572,219 - -
EUR Summary Stats Tables

Autosomes

This table includes autosome filtering statistics prior to merging with chrX.

QC procedure Variants removed Variants retained Subjects removed Subjects retained
STRUCTURE analysis (all chr) - 572,219 2,715 7,461
Partitioning to only autosomes 14,112 558,107 - -
Remove subjects missing whole autosome data - - 0 -
Remove variants with missing call rate > 3% 82,612 475,495 - -
Remove variants with HWE p < 0.0001 3,832 471,663 - -

chrX

This table includes chrX filtering statistics prior to merging with autosomes.

QC procedure Variants removed Variants retained Subjects removed Subjects retained
Partitioning to only chrX 558,107 14,112 - -
Remove subjects missing whole chrX data - - 0 -
Remove variants with missing call rate > 3% 2,406 11,706 - -
Remove variants with HWE p < 0.0001 15 11,691 - -

Merged

QC procedure Variants removed Variants retained Subjects removed Subjects retained
Excessive homozygosity filter - 483,354 0 -
Remove Subjects with IBS > 0.9 - - 1,145 6,316
Remove Subjects with IBD > 0.4 - 471,663 1,408 4,908
Genotype Call Rate Subject Filter (3%) - - 78 4,830
Sex discordance filter - 483,354 12 4,818

Pre-imputation filtering

QC procedure Variants removed Variants retained Subjects removed Subjects retained
remove 1000G discordant alleles 5,444 477,910 0 -
remove monomorphic variants 10,750 467,160 0 -
remove individuals missing whole chr - 467,160 0 4,818
AFR Summary Stats Tables

Autosomes

This table includes autosome filtering statistics prior to merging with chrX.

QC procedure Variants removed Variants retained Subjects removed Subjects retained
STRUCTURE analysis (all chr) - 572,219 8,113 2,063
Partitioning to only autosomes 14,112 558,107 - -
Remove subjects missing whole autosome data - - 0 -
Remove variants with missing call rate > 3% 94,597 463,510 - -
Remove variants with HWE p < 0.0001 5,052 458,458 - -

chrX

This table includes chrX filtering statistics prior to merging with autosomes.

QC procedure Variants removed Variants retained Subjects removed Subjects retained
Partitioning to only chrX 572,219 14,112 - -
Remove subjects missing whole chrX data - - 0 -
Remove variants with missing call rate > 3% 2,627 11,485 - -
Remove variants with HWE p < 0.0001 44 11,441 - -

Merged

QC procedure Variants removed Variants retained Subjects removed Subjects retained
Merged - 469,899 - 2,063
Remove Subjects with IBS > 0.9 - - 3 2,060
Remove Subjects with IBD > 0.4 - - 32 2,028
Genotype Call Rate Subject Filter (3%) - - 54 1,974
Sex discordance filter - - 4 1,970
Excessive homozygosity filter - - 0 -

Pre-imputation filtering

QC procedure Variants removed Variants retained Subjects removed Subjects retained
remove 1000G discordant alleles 5,384 464,515 - -
remove monomorphic variants 11,464 453,051 - -
remove individuals missing whole chr - 453,051 - 1,970
AMR Summary Stats Tables

Autosomes

This table includes autosome filtering statistics prior to merging with chrX.

QC procedure Variants removed Variants retained Subjects removed Subjects retained
STRUCTURE analysis (all chr) - 572,219 9,534 642
Partitioning to only autosomes 14,112 558,107 - -
Remove subjects missing whole autosome data - - 0 -
Remove variants with missing call rate > 3% 84,711 473,396 - -
Remove variants with HWE p < 0.0001 1,299 472,097 - -

chrX

This table includes chrX filtering statistics prior to merging with autosomes.

QC procedure Variants removed Variants retained Subjects removed Subjects retained
Partitioning to only chrX 572,219 14,112 - -
Remove subjects missing whole chrX data - - 0 -
Remove variants with missing call rate > 3% 2,371 11,741 - -
Remove variants with HWE p < 0.0001 10 11,731 - -

Merged

QC procedure Variants removed Variants retained Subjects removed Subjects retained
Merged - 483,828 - 642
Remove Subjects with IBS > 0.9 - - 47 595
Remove Subjects with IBD > 0.4 - - 59 536
Genotype Call Rate Subject Filter (3%) - - 8 528
Sex discordance filter - - 0 -
Excessive homozygosity filter - - 0 -

Pre-imputation filtering

QC procedure Variants removed Variants retained Subjects removed Subjects retained
remove 1000G discordant alleles 5,446 478,382 - -
remove monomorphic variants 20,635 457,747 - -
remove individuals missing whole chr - 457,747 - 528



Phenotype & Covariate Distributions

EUR
N
HIV case 2,365
HIV control 2,398
Alc_Dep case 1,223
Alc_Dep contrl 3,540
Male 3,307
Female 1,456



cfar_coga_age_distributions



cfar_coga_hiv_sex_alcohol_distributions

AFR

N=1969

N
HIV case 1,832
HIV control 137
Alc_Dep case 495
Alc_Dep contrl 1,474
Male 1,338
Female 631



cfar_coga_afr_n1969_age_distribution
cfar_coga_afr_n1969_hiv_sex_alcohol_distributions

AMR

N=526

N
HIV case 414
HIV control 112
Alc_Dep case 169
Alc_Dep contrl 357
Male 100
Female 426



cfar_coga_amr_n526_age_distribution
cfar_coga_amr_n526_hiv_sex_alcohol_distributions


EUR Genotype PCs

PCs 1-10

image

image

Genotype PCs Explaining Phenotypic Variation
================ EUR group ================
Top PCs:  PC8 PC4 PC3 PC5 
PVE:      80.2

image
image

@jaamarks
Copy link
Owner Author

jaamarks commented Mar 30, 2020

CFAR_COGA EUR HIV Acquisition GWAS (N=4,763)

PC3, PC4, PC5, PC8

Included as covariates: age, sex, alc_dep, PC8,PC4,PC3,PC5 (~80%)
Performed with RVTESTS.


Click buttons to expand Manhattan and QQ plots.

RSQ 0.30 RSQ 0.80 RSQ 0.90
MAF 1%
🕳️ cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 01 assoc plot snps+indels manhattan cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 01 assoc plot snps+indels qq
🕳️ cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 01 assoc plot snps+indels manhattan cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 01 assoc plot snps+indels qq
🕳️ cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 01 assoc plot snps+indels manhattan cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 01 assoc plot snps+indels qq
MAF 3%
🕳️ cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 03 assoc plot snps+indels manhattan cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 03 assoc plot snps+indels qq
🕳️ cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 03 assoc plot snps+indels manhattan cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 03 assoc plot snps+indels qq
🕳️ cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 03 assoc plot snps+indels manhattan cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 03 assoc plot snps+indels qq
MAF 5%
🕳️ cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 05 assoc plot snps+indels manhattan cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 05 assoc plot snps+indels qq
🕳️ cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 05 assoc plot snps+indels manhattan cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 05 assoc plot snps+indels qq
🕳️ cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 05 assoc plot snps+indels manhattan cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 05 assoc plot snps+indels qq

Tables P-value < 10e-4

click to expand

cfar_coga.eur.1000g_p3.hiv_acq.maf_0.01.rsq_0.30.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.01.rsq_0.80.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.01.rsq_0.90.p_lte_0.001.txt

cfar_coga.eur.1000g_p3.hiv_acq.maf_0.03.rsq_0.30.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.03.rsq_0.80.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.03.rsq_0.90.p_lte_0.001.txt

cfar_coga.eur.1000g_p3.hiv_acq.maf_0.05.rsq_0.30.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.05.rsq_0.80.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.05.rsq_0.90.p_lte_0.001.txt


MAF filter applied separately for cases & controls

click to expand
MAF 1% (applied separately) and RSQ 0.30

cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 01_both assoc plot snps+indels manhattan
cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 01_both assoc plot snps+indels qq

MAF 1% (applied separately) and RSQ 0.80

cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 01_both assoc plot snps+indels manhattan
cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 01_both assoc plot snps+indels qq

MAF 1% (applied separately) and RSQ 0.90

cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 01_both assoc plot snps+indels manhattan
cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 01_both assoc plot snps+indels qq

MAF 3% (applied separately) and RSQ 0.30

cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 03_both assoc plot snps+indels manhattan
cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 03_both assoc plot snps+indels qq

MAF 3% (applied separately) and RSQ 0.80

cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 03_both assoc plot snps+indels manhattan
cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 03_both assoc plot snps+indels qq

MAF 3% (applied separately) and RSQ 0.90

cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 03_both assoc plot snps+indels manhattan
cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 03_both assoc plot snps+indels qq

Tables P-value < 10e-4

click to expand

cfar_coga.eur.1000g_p3.hiv_acq.maf_0.01_both.rsq_0.30.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.01_both.rsq_0.80.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.01_both.rsq_0.90.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.03_both.rsq_0.30.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.03_both.rsq_0.80.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.03_both.rsq_0.90.p_lte_0.001.txt




Top 10 PCs

Included as covariates: age, sex, alc_dep, PC1–10

Click buttons to expand Manhattan and QQ plots.

RSQ 0.30 RSQ 0.80 RSQ 0.90
MAF 1%
🕳️ cfar_coga eur 1000g hiv_acq assoc plot snps+indels manhattan cfar_coga eur 1000g hiv_acq assoc plot snps+indels qq
🕳️ cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 01 assoc plot snps+indels qq cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 03 assoc plot snps+indels manhattan
🕳️ cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 01 assoc plot snps+indels manhattan cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 01 assoc plot snps+indels qq
MAF 3%
🕳️ cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 03 assoc plot snps+indels manhattan cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 03 assoc plot snps+indels qq
🕳️ cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 03 assoc plot snps+indels manhattan cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 03 assoc plot snps+indels qq
🕳️ cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 03 assoc plot snps+indels manhattan cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 03 assoc plot snps+indels qq
MAF 5%
🕳️ cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 05 assoc plot snps+indels manhattan cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 05 assoc plot snps+indels qq
🕳️ cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 05 assoc plot snps+indels manhattan cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 05 assoc plot snps+indels qq
🕳️ cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 05 assoc plot snps+indels manhattan cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 05 assoc plot snps+indels qq

Tables P-value < 10e-4

click to expand

cfar_coga.eur.1000g_p3.hiv_acq.maf_0.01.rsq_0.30.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.01.rsq_0.80.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.01.rsq_0.90.p_lte_0.001.txt

cfar_coga.eur.1000g_p3.hiv_acq.maf_0.03.rsq_0.30.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.03.rsq_0.80.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.03.rsq_0.90.p_lte_0.001.txt

cfar_coga.eur.1000g_p3.hiv_acq.maf_0.05.rsq_0.30.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.05.rsq_0.80.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.05.rsq_0.90.p_lte_0.001.txt


MAF filter applied separately for cases & controls

click to expand
MAF 1% (applied separately) and RSQ 0.30

cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 01_both assoc plot snps+indels manhattan
cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 01_both assoc plot snps+indels qq

MAF 1% (applied separately) and RSQ 0.80

cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 01_both assoc plot snps+indels manhattan
cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 01_both assoc plot snps+indels qq

MAF 1% (applied separately) and RSQ 0.90

cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 01_both assoc plot snps+indels manhattan
cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 01_both assoc plot snps+indels qq

MAF 3% (applied separately) and RSQ 0.30

cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 03_both assoc plot snps+indels manhattan
cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 03_both assoc plot snps+indels qq

MAF 3% (applied separately) and RSQ 0.80

cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 03_both assoc plot snps+indels manhattan
cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 03_both assoc plot snps+indels qq

MAF 3% (applied separately) and RSQ 0.90

cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 03_both assoc plot snps+indels manhattan
cfar_coga eur 1000g hiv_acq rsq_0 90 maf_0 03_both assoc plot snps+indels qq

Tables P-value < 10e-4

click to expand

cfar_coga.eur.1000g_p3.hiv_acq.maf_0.01_both.rsq_0.30.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.01_both.rsq_0.80.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.01_both.rsq_0.90.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.03_both.rsq_0.30.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.03_both.rsq_0.80.p_lte_0.001.txt
cfar_coga.eur.1000g_p3.hiv_acq.maf_0.03_both.rsq_0.90.p_lte_0.001.txt

@jaamarks
Copy link
Owner Author

jaamarks commented Apr 8, 2020

CFAR_COGA HIV Acquisition GWAS (N=4,373)

Removed age outliers from COGA (24 < age < 86).

EUR Phenotype & Covariate Distributions

Click to Expand

cfar_coga_n4374_age_distribution

cfar_coga_n4374_hiv_sex_alcohol_distributions


================ EUR group ================
Top PCs:  PC3 PC4 PC1 
PVE:      75.44

cfar_coga_n4374_phenotype_variance_explained_by_genotype_pcs_sorted


top 3 PCs

Included as covariates: age, sex, alc_dep, PC1,PC3,PC4

RSQ 0.30 MAF 1%

cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 01 assoc plot snps+indels manhattan
cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 01 assoc plot snps+indels qq

RSQ 0.80 MAF 3%

cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 03 assoc plot snps+indels manhattan
cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 03 assoc plot snps+indels qq


top 10 PCs

Included as covariates: age, sex, alc_dep, PC1–10

RSQ 0.30 MAF 1%

cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 01 assoc plot snps+indels manhattan
cfar_coga eur 1000g hiv_acq rsq_0 30 maf_0 01 assoc plot snps+indels qq

RSQ 0.80 MAF 3%

cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 03 assoc plot snps+indels manhattan
cfar_coga eur 1000g hiv_acq rsq_0 80 maf_0 03 assoc plot snps+indels qq

@jaamarks
Copy link
Owner Author

jaamarks commented Apr 10, 2020

Details about chr23 & chr6 top hits

jmarks@RTI-103356 ~/Projects/hiv/cfar_coga/gwas/0001
awk '($2==23 && $NF< 10E-20) {print $NF}' cfar_coga.eur.1000g_p3.hiv_acq.maf_0.03.rsq_0.80.p_lte_0.001.txt

awk '($2==6 && $NF< 5.61e-8) {print $0}' cfar_coga.eur.1000g_p3.hiv_acq.maf_0.03.rsq_0.80.p_lte_0.001.txt


ID CHROM POS REF ALT N_INFORMATIVE AF INFORMATIVE_ALT_AC CALL_RATE HWE_PVALUE N_REF N_HET N_ALT U_STAT SQRT_V_STAT ALT_EFFSIZE PVALUE
rs76422484:62817386:C:T 23 62817386 C T 4756 0.106378:0.158198:0.00995458 703.581:680.566:23.015 1:1:1 4.67019e-20:9.20735e-31:1 2578:1446:1132 729:705:24 0:0:0 193.219 12.2948 1.27823 1.18365e-55
rs78475991:62880223:A:C 23 62880223 A C 4756 0.10861:0.161415:0.0103525 718.344:694.409:23.935 1:1:1 4.67019e-20:9.20735e-31:1 2578:1446:1132 729:705:24 0:0:0 196.814 12.6192 1.23593 7.70042e-55
rs9400531:112656072:G:C 6 112656072 G C 4763 0.180376:0.196583:0.164392 1718.26:929.839:788.422 1:1:1 0.407277:0.400121:0.940937 3200:1528:1672 1398:738:660 165:99:66 83.2262 15.3245 0.354397 5.60578e-08
rs9384958:116112728:A:G 6 116112728 A G 4763 0.20753:0.2334:0.182016 1976.93:1103.98:872.949 1:1:1 0.223407:0.42705:0.588751 2977:1381:1596 1559:843:716 227:141:86 86.9972 16.0186 0.339043 5.60383e-08
rs6926556:116118455:T:C 6 116118455 T C 4763 0.214763:0.244088:0.185842 2045.83:1154.54:891.296 1:1:1 0.300898:0.575892:0.636439 2956:1362:1594 1577:859:718 230:144:86 97.7634 16.4618 0.360763 2.87105e-09

Note that the allele frequency for controls (COGA) in chr23 SNPs is near zero. Should we apply the MAF filter individually for cases and controls or overall like we currently do?

@jaamarks
Copy link
Owner Author

chrom name McLaren_beta McLaren_P CFAR_COGA_beta CFAR_COGA_P
6 rs12210050:475489:C:T 0.2140599325811672 4.847e-09 -0.157585 0.0159497
6 rs41561016:31322611:C:T -0.41144717978571177 9.459e-09 -0.0396366 0.750087
6 rs41557415:31323455:A:G 0.4123386770513366 9.424e-09 -0.0400005 0.74785
6 rs1140487:31322987:C:T -0.412109650826833 9.457e-09 -0.0400005 0.74785
6 rs41543314:31322690:A:G 0.4028684822608984 2.332e-08 -0.0839558 0.507325

@jaamarks
Copy link
Owner Author

jaamarks commented Apr 16, 2020

Verifying the coding for both McLaren and CFAR_COGA.

McLaren (original)
CHR SNP BP A1 A2 OR P
6 rs12210050 475489 T C 0.8073 4.847e-09
6 rs41561016 31322611 T C 1.509 9.459e-09
6 rs41557415 31323455 A G 0.6621 9.424e-09
6 rs1140487 31322987 T C 1.51 9.457e-09
6 rs41543314 31322690 A G 0.6684 2.332e-08
MCLAREN (converted)
chrom name position REF ALT ALT_EFFSIZE p
6 rs12210050:475489:C:T 475489 T C 0.2140599325811672 4.847e-09
6 rs41561016:31322611:C:T 31322611 T C -0.41144717978571177 9.459e-09
6 rs41557415:31323455:A:G 31323455 A G 0.4123386770513366 9.424e-09
6 rs1140487:31322987:C:T 31322987 T C -0.412109650826833 9.457e-09
6 rs41543314:31322690:A:G 31322690 A G 0.4028684822608984 2.332e-08

CFAR_COGA

ID CHROM POS REF ALT ALT_EFFSIZE PVALUE
rs12210050:475489:C:T 6 475489 C T -0.157585 0.0159497
rs41561016:31322611:C:T 6 31322611 C T -0.0396366 0.750087
rs41557415:31323455:A:G 6 31323455 A G -0.0400005 0.74785
rs1140487:31322987:C:T 6 31322987 C T -0.0400005 0.74785
rs41543314:31322690:A:G 6 31322690 A G -0.0839558 0.507325

@jaamarks
Copy link
Owner Author

jaamarks commented Apr 21, 2020

GWAS results of top SNPs on chr1 and 19

jmarks@RTI-103356 ~/Projects/hiv/cfar_coga/gwas/0001/maf_both
$ awk '$17 < 1e-15' cfar_coga.eur.1000g_p3.hiv_acq.maf_0.03_both.rsq_0.90.p_lte_0.001.txt
ID CHROM POS REF ALT N_INFORMATIVE AF INFORMATIVE_ALT_AC CALL_RATE HWE_PVALUE N_REF N_HET N_ALT U_STAT SQRT_V_STAT ALT_EFFSIZE PVALUE
rs10911132:182753673:G:A 1 182753673 G A 4763 0.0975693:0.141029:0.0547075 929.445:667.068:262.377 1:1:1 0.75388:0.00376253:0.117543 3829:1688:2141 886:641:245 48:36:12 125.07 11.5818 0.932393 3.48702e-27
rs10911133:182753838:G:T 1 182753838 G T 4763 0.102479:0.149187:0.0564141 976.216:705.654:270.562 1:1:1 0.587918:0.00149029:0.117543 3816:1675:2141 899:654:245 48:36:12 134.85 12.0782 0.924377 6.0628e-29
rs1064257:49993535:C:G 19 49993535 C G 4763 0.0874761:0.118845:0.0565384 833.297:562.139:271.158 1:1:1 0.000413457:0.00120084:0.00595059 3963:1833:2130 783:516:267 17:16:1 109.352 10.96 0.910348 1.91459e-23



Dose info file

SNP REF(0) ALT(1) ALT_Frq MAF AvgCall Rsq Genotyped LooRsq EmpR EmpRsq Dose0 Dose1
1:182753838:G:T G T 0.10209 0.10209 0.99400 0.94751 Genotyped 0.540 0.637 0.40571 0.45810 0.03929
1:182753673:G:A G A 0.09718 0.09718 0.99015 0.91021 Imputed - - - - -
19:49993535:C:G C G 0.08712 0.08712 0.99541 0.95675 Genotyped 0.760 0.654 0.42775 0.50306 0.02054

EmpR, EmpRsq

While the LooRsq statistic completely ignores experimental genotypes, EmpR is calculated by calculating the correlation between the true genotyped values and the imputed dosages that were calculated by hiding all known genotyped for the given SNP (see LooDosage). A negative correlation between imputed and experimental genotypes can indicate allele flips. This statistic also can only be provided for genotyped sites. EmpRsq is the square of this correlation.

@jaamarks jaamarks reopened this Apr 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant