Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose/implementation Section
Please link to the GitHub issue that this pull request addresses.
This PR is linked to the issue 856.
What is the goal of this pull request?
I aim here to improve the workflow of the annotations performed in
07_combined_annotation_across_samples_exploration.Rmd
.The suggested changes aim to overcome some misclassification that have been spotted in the first version of the annotation while working on the integration of the data.
cnv-threshold
. See the expression of PTPRC = CD45 below:We requested immune and endothelial cells to have no infered cnv to be labeled as normal. I think that the few CNV that were infered for immune cells are cell type specific expression of some genes that colocalized on one chromosome. Thus I decided to allow a bit of flexibility in regards to CNV for immune and endothelial cells and only based the annotation of immune and endothelial cells on the
predicted.cell-type
and associatedprediction.score
.Similar to the immune cells, there is a high probability that we also infer false positive CNV in normal epithelial/stroma cells. I am not sure that it will be possible to differentiate between normal cells with false positive CNV from cancer cells. For that reason, I decided to label as
normal
cells with acnv-score <= cnv-threshold
and ascancer
cells with acnv-score > cnv-threshold + 2
.The choice of
+2
is a bit arbitrary, but based on the approximate number of CNV that can be detected inendothelial
andimmune
cells.There is the possibility of cancer cell without any CNV. It would be extremly difficult in that case to differentiate them from normal cells. The strategy I decided to go with to reduce the risk of misclassification is to use the prediction score of label transfer. Indeed, we expect normal cells to resemble the cells from the fetal kidney reference and thus have a high
prediction.score
.While working on the integration (see umap reduction here colored by
cnv_score
), I realized that samples -177, -180, -181, -190, -197 have strikingly very lowcnv_score
. Thecnv_score
is simply the number of chromosome presenting a CNV for each patient. Of note, these 5 samples are the ones with no immune and/or endothelial cells for which we couldn’t runinfercnv
properly. We raninfercnv
without any reference. In that case, the mean over all cellular expression profiles is taken as a reference. If the sample is mostly composed of cancer cell, the cancer-associated CNV are taken as the normal reference...I am afraid not to have a solution for this, but I am quite sure that the cells for these 5 patients are mis-classified. This will affect the downstream differential expression analysis cancer versus normal and the potential finding of Wilms tumor histology specific markers. This is why I would strongly recommand forcing the annotations for these patients to
unknown
.do_Feature_mean
. Initially, thefeature
ploted withdo_Feature_mean
wasprop_cnv_chr[i]
, the proportion of the chromosome i affected by a CNV. The functiondo_Feature_mean
took then the mean over all cells in a group. This can be misleading are it is then impossible to interpret the mean value. Example with a mean of 0.5, we could say:For that reason, I changed the
feature
tohas_cnv_chr[i]
, which is a binary information of the presence/absence of CNV in the chromosome i for the given cell. While taking the mean over all cells in a group, we then have the information of the percentage of cells within the group having any kind of CNV in this chromosome.If known, do you anticipate filing additional pull requests to complete this analysis module?
-[ ] one for the integration of the samples
-[ ] one for differential expression analysis, looking for Wilms tumor, histology specific markers
Results
The annotation file is generated automatically while running the script and will be saved in the
results
folder.The notebook is saved in the
notebook
folder.Provide directions for reviewers
What are the software and computational requirements needed to be able to run the code in this PR?
Are there particularly areas you'd like reviewers to have a close look at?
Is there anything that you want to discuss further?
Author checklists
Analysis module and review
README.md
has been updated to reflect code changes in this pull request.Reproducibility checklist
Dockerfile
.environment.yml
file.renv.lock
file.