-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split out notebook for individual cell predictions visualization #88
Comments
I'm posting some preliminary results from the 10X dataset so I can write down some thoughts about next steps. The code is available on this branch: Here's a plot looking at the percentage of individual cells labeled for each cell type: The Smart-seq2 results (left hand side) show, for the most part, most cells are labeled "correctly" for that cell type. On the right hand side, the 10X dataset shows that many cells – regardless of the true subgroup – are labeled as G3. This is particularly true for the RF model. It is worth looking into why this might be. Like many things, there might be a technical or biological explanation. Some thoughts on what we might look into: Technical
Biological The first thing I'd want to explore is the cell type identity of cells labeled G3 vs. others in the 10X dataset. We can get cell type labels from UCSC Cell Browser (and that's added in the branch mentioned above ☝🏻). From a cursory look, most cells are labeled malignant without a "finer" cell label. However, the cell metadata from UCSC also includes cluster labels. We can see if we can align this information with what is included in the publication to get more information about cell state. Tagging @envest for visibility. |
I looked at the difference between the max subgroup score and the second highest subgroup score within an individual cell. (This is an idea I am toying with as an alternative to "confidence" [max/total score].) I was interested in whether the G3 cell type labels in the 10X were "closer calls" (i.e., the differences were smaller) in samples from other subgroups. That doesn't necessarily seem to be true in the one WNT 10x sample or the G4 10x samples for the RF model. |
We now are predicting labels for individual cells on two datasets: Smart-seq2 and 10X. We should split the individual cell portions of
analysis_notebooks/pseudobulk_and_single_cells.Rmd
into a new notebook that visualizes the results.The text was updated successfully, but these errors were encountered: