-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jesse: CFAR + COGA QC & GWAS #8
Comments
Ternary PlotsCFAR: HAPMAP as reference PanelI also performed the Structure analysis on the CFAR subjects using a more homogeneous reference panel. In particular, I used a subset of the three superpopulations—AFR, EUR, and EAS—that we generally use. The subset populations I used as references were YRI for AFR, CHB for EAS, and CEU for EUR. Here is the triangle plot for that analysis. CFAR: include 1000G subjects in plotI have included the 1,668 1KG subjects in the triangle plot below. The tight grouping at the vertices comes to no surprise however, since in Structure these were specified as the reference populations. CFAR+COGA EAHere are the merged CFAR+COGA samples on a triangle plot generated from the STRUCTURE analysis. We took a 10K random sample (as per usual) from the post-QC genotype data. The cases (CFAR) are blue and the controls (COGA) are red. Both plots are the same except that the controls were plotted first on the top plot and the controls were plotted second on the bottom plot. |
We were a bit concerned that the CFAR subjects were shifted to the left. After double checking everything and reperforming the Structure analysis, it appears that this is correct though. We used the same code to process the COGA subjects and also the WIHS3 subjects and those data appear fine, so it must just be something inherent with the CFAR sample. |
GWAS ResultsGenotyped SNPs only: RSQ (0.30) and MAF (0.01) filtersOnly the genotyped SNPs that also passed the RSQ (0.30) and MAF (0.01) filters. Genotyped SNPs only: RSQ (0.90) and MAF (0.01) filtersOnly the genotyped SNPs that also passed the RSQ (0.90) and MAF (0.01) filters. Top SNPs (p<0.001) from the original GWAS model: |
Revisit: QC CFAR+COGA together |
QC COGA+CFAR combinedWe are going to combine the CFAR and COGA genotype data and QC them together. Here are the results from the STRUCTURE analysis thus far.
Ternary PlotsCFAR subjects are blue. CFAR_COGA Genotype QC Summary StatsBelow are the genotype summary stats for the merged CFAR & COGA data. Initial Summary Stats
EUR Summary Stats TablesAutosomesThis table includes autosome filtering statistics prior to merging with chrX.
chrXThis table includes chrX filtering statistics prior to merging with autosomes.
Merged
Pre-imputation filtering
AFR Summary Stats TablesAutosomesThis table includes autosome filtering statistics prior to merging with chrX.
chrXThis table includes chrX filtering statistics prior to merging with autosomes.
Merged
Pre-imputation filtering
AMR Summary Stats TablesAutosomesThis table includes autosome filtering statistics prior to merging with chrX.
chrXThis table includes chrX filtering statistics prior to merging with autosomes.
Merged
Pre-imputation filtering
Phenotype & Covariate DistributionsEUR
AFRN=1969
EUR Genotype PCs |
CFAR_COGA EUR HIV Acquisition GWAS (N=4,763)PC3, PC4, PC5, PC8Included as covariates: age, sex, alc_dep, PC8,PC4,PC3,PC5 (~80%)
Tables P-value < 10e-4click to expandcfar_coga.eur.1000g_p3.hiv_acq.maf_0.01.rsq_0.30.p_lte_0.001.txt cfar_coga.eur.1000g_p3.hiv_acq.maf_0.03.rsq_0.30.p_lte_0.001.txt cfar_coga.eur.1000g_p3.hiv_acq.maf_0.05.rsq_0.30.p_lte_0.001.txt MAF filter applied separately for cases & controlsclick to expandTables P-value < 10e-4click to expandcfar_coga.eur.1000g_p3.hiv_acq.maf_0.01_both.rsq_0.30.p_lte_0.001.txt Top 10 PCsIncluded as covariates: age, sex, alc_dep, PC1–10 Click buttons to expand Manhattan and QQ plots.
Tables P-value < 10e-4click to expandcfar_coga.eur.1000g_p3.hiv_acq.maf_0.01.rsq_0.30.p_lte_0.001.txt cfar_coga.eur.1000g_p3.hiv_acq.maf_0.03.rsq_0.30.p_lte_0.001.txt cfar_coga.eur.1000g_p3.hiv_acq.maf_0.05.rsq_0.30.p_lte_0.001.txt MAF filter applied separately for cases & controlsclick to expandTables P-value < 10e-4click to expandcfar_coga.eur.1000g_p3.hiv_acq.maf_0.01_both.rsq_0.30.p_lte_0.001.txt |
CFAR_COGA HIV Acquisition GWAS (N=4,373)Removed age outliers from COGA (24 < age < 86). EUR Phenotype & Covariate Distributionstop 3 PCsIncluded as covariates: age, sex, alc_dep, PC1,PC3,PC4 top 10 PCsIncluded as covariates: age, sex, alc_dep, PC1–10 |
Details about chr23 & chr6 top hits
Note that the allele frequency for controls (COGA) in chr23 SNPs is near zero. Should we apply the MAF filter individually for cases and controls or overall like we currently do? |
|
Verifying the coding for both McLaren and CFAR_COGA. McLaren (original)
MCLAREN (converted)
CFAR_COGA
|
GWAS results of top SNPs on chr1 and 19
Dose info file
While the LooRsq statistic completely ignores experimental genotypes, EmpR is calculated by calculating the correlation between the true genotyped values and the imputed dosages that were calculated by hiding all known genotyped for the given SNP (see LooDosage). A negative correlation between imputed and experimental genotypes can indicate allele flips. This statistic also can only be provided for genotyped sites. EmpRsq is the square of this correlation. |
See parent GitHub Issue
133
.CFAR dbGaP
COGA dbGaP
QC these dbGaP studies and combine for GWAS and eventual inclusion in the HIV acquisition meta-analysis.
Age Distributions
Age distributions for CFAR and COGA
CFAR dbGaP Ternary Plot
COGA dbGaP Ternary Plot
The text was updated successfully, but these errors were encountered: