convert mouse gene identifiers to human ones that match data in GWAS summary data #3

Jylab-Genetics · 2024-11-04T11:32:52Z

Why do we need to execute 'convert mouse gene identifiers to human ones that match data in GWAS summary data'? I don't quite understand. Are GWAS sources different from single-cell data sources?

VincentQLai · 2024-11-04T19:01:34Z

Yes, they are sometimes different.

For a genome-wide association study (GWAS), hundreds of thousands of whole-genome sequencing data are collected and statistical tests are performed for each SNP, which is primarily encoded as information from the human genome. While for single-cell RNA-seq, it can be performed for any organisms. So there is a need for converting the gene ID mapping so that the data can match.

In terms of the specific issue of seismic, it requires the input of MAGMA gene-level Z-score file as an input, where the SNP-level statistics are aggregated to gene-level, which is encoded as Human Entrez ID. As a result, unless the gene names are already encoded as Human Entrez ID in the scRNA-seq data, there is a need for gene ID conversion. Currently seismic's innate data structure can only handle conversions between several listed gene ID types. We plan to implement more flexible conversion options in future updates.

Hope this information helps address your question.

Jylab-Genetics · 2024-11-05T04:14:36Z

Thank you for your prompt response. This software demonstrates exceptional flexibility in integrating SNP and single-cell RNA-seq data, especially in gene ID conversion and data matching, offering unprecedented convenience for association analyses at the gene level and opening new avenues for research.

Regarding cross-species compatibility, I would like to confirm: does the software currently mainly support association data between humans and mice? Is it possible to extend this support to integrate human SNP information with single-cell data from non-human primates or other mammals? Such an expansion would be highly valuable for cross-species genetic association studies, helping to uncover the molecular mechanisms of specific traits across different species. I look forward to your further clarification.

VincentQLai · 2024-11-27T20:12:16Z

Yes, our framework supports finding associated cell types of any traits in either mouse or human scRNA-seq datasets. Unfortunately, as you can see from the previous screenshot, the current version of seismicGWAS uses a static internal gene mapping table, limited to mouse and human gene symbols. In future versions, we plan to allow customized gene mapping inputs. While most current scRNA-seq datasets are human or mouse-based, we realize the growing use of other organisms and plan to accommodate them in upcoming updates.
If this functionality is critical for your work, please let us know, and we can prioritize its implementation.

VincentQLai · 2024-11-28T09:24:11Z

Great news! Our latest version now supports the translation of gene ids for the gene-level seismic specificity score using customized gene mapping table. Please refer to the function description page for the usage of the exact arguments. Here is an example of how you may create a customized data frame of gene mapping and use seismicGWAS to identify associated cell types. Assuming you've reached the step where there is a specificity score matrix and you would like to translate it from mouse gene symbols to human Entrez ID using a customized gene mapping table. For example, you may borrow the information of homology mapping using homologene package.

#Load  homolgene package
library("homolgene")

#Create gene mapping table mouse (Taxonomy ID: 10090) and human (Taxonomy ID: 9606)
#To search for species taxonomy ID, please refer to NCBI
mapping <- homologene(genes = rownames(tmfacs_sce_small),inTax = 10090, outTax = 9606)

#Translate gene id
tmfacs_sscore_hsa <- translate_gene_ids(tmfacs_sscore, from='10090', to = "9606_ID", gene_mapping_table = mapping)

#Prioritize associated cell types
t2d <- get_ct_trait_associations(tmfacs_sscore_hsa, t2d_magma)

Jylab-Genetics · 2024-11-28T15:39:47Z

Thank you for the exciting update! The new feature for translating gene IDs using a customized gene mapping table is incredibly useful for my analysis. However, when using homologous conversion to relate human GWAS with single-cell data from other species, such as dog, I have not observed any significant associations with cell types. This could be a true result, but it may also be influenced by other factors. I have not yet validated this approach with species other than dog. I will soon explore this method further in other species to see if the results hold." Let me know if you'd like any adjustments! | | 谭光辉（GuangHui-Tan）西北农林科技大学动物科技学院，Northwest A&F University 17585473094 | ---- Replied Message ---- | From | Qiliang ***@***.***> | | Date | 11/28/2024 17:24 | | To | ***@***.***> | | Cc | ***@***.***>, ***@***.***> | | Subject | Re: [ylaboratory/seismic] convert mouse gene identifiers to human ones that match data in GWAS summary data (Issue #3) | Great news! Our latest version now supports the translation of gene ids for the gene-level seismic specificity score using customized gene mapping table. Please refer to the function description page for the usage of the exact arguments. Here is an example of how you may create a customized data frame of gene mapping and use seismicGWAS to identify associated cell types. Assuming you've reached the step where there is a specificity score matrix and you would like to translate it from mouse gene symbols to human Entrez ID using a customized gene mapping table. For example, you may borrow the information of homology mapping using homologene package. #Load homolgene package library("homolgene") #Create gene mapping table mouse (Taxonomy ID: 10090) and human (Taxonomy ID: 9606) #To search for species taxonomy ID, please refer to NCBI mapping <- homologene(genes = rownames(tmfacs_sce_small),inTax = 10090, outTax = 9606) #Translate gene id tmfacs_sscore_hsa <- translate_gene_ids(tmfacs_sscore, from='10090', to = "9606_ID", gene_mapping_table = mapping) #Prioritize associated cell types t2d <- get_ct_trait_associations(tmfacs_sscore_hsa, t2d_magma) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

VincentQLai · 2024-11-28T20:01:04Z

Thank you for sharing your results! Our seismic framework integrates GWAS data to prioritize cell types in normal tissues that may exhibit genetic vulnerability. A variety of factors can influence the results. Below is a troubleshooting checklist to help identify potential issues:

Quality Control: Ensure that low-quality cells have been removed, including: cells with insufficient RNA counts, cells identified as doublets, cells with excessive mitochondrial gene expression.
Normalization. Verify that the data have been properly normalized. We recommend using scran for size factor calculation.
GWAS Summary Statistics: Ensure the GWAS summary statistics file is processed correctly. In our study, we used a window size of 35kb upstream and 10kb downstream. While seismic shows consistency across different window sizes, this parameter may still influence outcomes. The population structure should ideally be simple (e.g., cohorts of European ancestry) to allow MAGMA to effectively account for linkage disequilibrium (LD). Additionally, verify that the SNPs are annotated to the correct genome assembly version. If they are based on a noncanonical version, update the auxiliary files for MAGMA or use genome liftover tools to adjust SNP locations.
Granularity of Analysis: Select the appropriate granularity for your analysis. Genetic diseases may only affect specific subpopulations within a cell type, making signals harder to detect at broader levels. If this is the case, consider dividing the broader cell type into more granular subpopulations to enhance detection.
GWAS Quality: Evaluate the quality of the GWAS data. GWAS can vary significantly depending on the donor recruitment, quality control, covariate adjustments, and methodology used in the GWAS. Recent studies often exhibit better signals, but some older or specialized GWAS datasets may also produce strong, clear results.

Jylab-Genetics · 2024-11-29T16:55:59Z

Thank you very much for your guidance | | 谭光辉（GuangHui-Tan）西北农林科技大学动物科技学院，Northwest A&F University 17585473094 | ---- Replied Message ---- | From | Qiliang ***@***.***> | | Date | 11/29/2024 04:01 | | To | ***@***.***> | | Cc | ***@***.***>, ***@***.***> | | Subject | Re: [ylaboratory/seismic] convert mouse gene identifiers to human ones that match data in GWAS summary data (Issue #3) | Thank you for sharing your results! Our seismic framework integrates GWAS data to prioritize cell types in normal tissues that may exhibit genetic vulnerability. A variety of factors can influence the results. Below is a troubleshooting checklist to help identify potential issues: Quality Control: Ensure that low-quality cells have been removed, including: cells with insufficient RNA counts, cells identified as doublets, cells with excessive mitochondrial gene expression. Normalization. Verify that the data have been properly normalized. We recommend using scran for size factor calculation. GWAS Summary Statistics: Ensure the GWAS summary statistics file is processed correctly. In our study, we used a window size of 35kb upstream and 10kb downstream. While seismic shows consistency across different window sizes, this parameter may still influence outcomes. The population structure should ideally be simple (e.g., cohorts of European ancestry) to allow MAGMA to effectively account for linkage disequilibrium (LD). Additionally, verify that the SNPs are annotated to the correct genome assembly version. If they are based on a noncanonical version, update the auxiliary files for MAGMA or use genome liftover tools to adjust SNP locations. Granularity of Analysis: Select the appropriate granularity for your analysis. Genetic diseases may only affect specific subpopulations within a cell type, making signals harder to detect at broader levels. If this is the case, consider dividing the broader cell type into more granular subpopulations to enhance detection. GWAS Quality: Evaluate the quality of the GWAS data. GWAS can vary significantly depending on the donor recruitment, quality control, covariate adjustments, and methodology used in the GWAS. Recent studies often exhibit better signals, but some older or specialized GWAS datasets may also produce strong, clear results. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert mouse gene identifiers to human ones that match data in GWAS summary data #3

convert mouse gene identifiers to human ones that match data in GWAS summary data #3

Jylab-Genetics commented Nov 4, 2024

VincentQLai commented Nov 4, 2024

Jylab-Genetics commented Nov 5, 2024

VincentQLai commented Nov 27, 2024 •

edited

Loading

VincentQLai commented Nov 28, 2024 •

edited

Loading

Jylab-Genetics commented Nov 28, 2024 via email

VincentQLai commented Nov 28, 2024 •

edited

Loading

Jylab-Genetics commented Nov 29, 2024 via email

convert mouse gene identifiers to human ones that match data in GWAS summary data #3

convert mouse gene identifiers to human ones that match data in GWAS summary data #3

Comments

Jylab-Genetics commented Nov 4, 2024

VincentQLai commented Nov 4, 2024

Jylab-Genetics commented Nov 5, 2024

VincentQLai commented Nov 27, 2024 • edited Loading

VincentQLai commented Nov 28, 2024 • edited Loading

Jylab-Genetics commented Nov 28, 2024 via email

VincentQLai commented Nov 28, 2024 • edited Loading

Jylab-Genetics commented Nov 29, 2024 via email

VincentQLai commented Nov 27, 2024 •

edited

Loading

VincentQLai commented Nov 28, 2024 •

edited

Loading

VincentQLai commented Nov 28, 2024 •

edited

Loading