Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch correction on expression counts and embeddings #18

Open
zhangnan0107 opened this issue Aug 22, 2024 · 3 comments
Open

batch correction on expression counts and embeddings #18

zhangnan0107 opened this issue Aug 22, 2024 · 3 comments

Comments

@zhangnan0107
Copy link

Thanks for sharing this dataset. I would like to ask for the data from cellxgene, is the batch correction applied for both the low-dimensional reduction embedding (e.g. UMAP) and the expression counts? Or it's just for the embedding. Thanks : )

@grst
Copy link
Member

grst commented Aug 22, 2024

batch effect correction only applies to the low-dimensional embedding (adata.obsm["X_scANVI"]) and whatever is derived from it (e.g. neighborhood graph, UMAP).

For all downstream analyses, we accounted for batch effects independently by including covariates in the linear models used for comparison.

@zhangnan0107
Copy link
Author

Thanks for your reply! I might have 2 follow-up questions about the expression counts in cellxgene data:

  1. there are three layers - X, which looks like normalized data, layer count and counts_length_scaled. Not sure if I understood this correctly, count is the raw count from original studies, counts_length_scaled is scaled count for only Smart-seq2 platform data (so raw counts was kept for other platforms?), and may I ask which normalization method is used for X?

  2. regarding batch effects, I think it can be added as cofactor in analysis like differential expression. I wonder for the dotplot of marker genes for cell-type annotation like in figure s1, did you also account batch effects in someway, or this is actually based on non-correction counts?

Thanks

@grst
Copy link
Member

grst commented Aug 31, 2024

  1. All you say is correct, X is simply scanpy.pp.normalize_total followed by scanpy.pp.log1p on the length-scaled counts

  2. the dotplots showning the cell-type markers were not adjusted for batch effects (we are also not trying to make any quantitative claims here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants