Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2024 December updates #830

Open
sjspielman opened this issue Dec 9, 2024 · 11 comments
Open

2024 December updates #830

sjspielman opened this issue Dec 9, 2024 · 11 comments

Comments

@sjspielman
Copy link
Member

This issue tracks items we notice need fixing from the December 2024 advanced scRNA-seq workshop.

@sjspielman
Copy link
Member Author

In the integration notebook, we should probably name the Y-axis proportion rather than its default count

# Use ggplot2 to make a barplot the cell types across samples

@sjspielman
Copy link
Member Author

sjspielman commented Dec 10, 2024

I didn't catch the full sentence Josh said during intro slides, but I caught the bit that we need to add an arrow somewhere in the "single sample roadmap" diagram (I believe related to marker gene or gene-set analysis).

@jaclyn-taroni
Copy link
Member

This comment should say that it is the path to the Cell Ranger matrix directory:

# Path to the Cell Ranger matrix file

@jaclyn-taroni
Copy link
Member

The "Will it integrate?" slide in the integration slides should say "healthy and tumor" instead of "healthy and normal"

@jaclyn-taroni
Copy link
Member

The first plotReducedDim() call in the integration notebook (line 463) has pretty complicated syntax -- I am not sure it's completely necessary that it is live.

@sjspielman
Copy link
Member Author

sjspielman commented Dec 11, 2024

In integration, I wonder if we want to change up a little of the opening chunks where we set file names and read them in. Some of these thoughts are based on (unrelated to training) code reviews that @jashapiro had left elsewhere, if he wants to weigh in on this potential change too:

  • Rather than dir() we might use list.files() instead to list out what's in the data directory
    ```{r input dir, live = TRUE}
    dir(input_dir)
    ```
  • Currently we use file.path() to form all the input file paths, but I wonder if it might simplify code to instead just go ahead and use list.files(full.names = TRUE) in the first place (or show list.files() first without and then with this argument)
    ```{r define sce_paths, live = TRUE}
    # Now, convert these to file paths: <input_dir>/<sample_name>.rds
    sce_paths <- file.path(input_dir,
    glue::glue("{sample_names}.rds")
    )
  • We add list names to sce_list only after we read in the files. We might want to add those names beforehand
    ```{r add list names, live = TRUE}
    # Assign the sample names as the names for sce_list
    names(sce_list) <- sample_names
    ```

@sjspielman
Copy link
Member Author

sjspielman commented Dec 11, 2024

Suggested integration changes:

  • We should beef up some of the explanation for how the cell types in Patel et al were obtained in the first place, since this helps contextualize how we use them to assess integration results
    If you look closely at the printed SCE objects, you may notice that they all contain `colData` table columns `celltype_fine` and `celltype_broad`.
    These columns (which we added to SCE objects during [pre-processing](https://github.com/AlexsLemonade/training-modules/tree/master/scRNA-seq-advanced/setup/rms)) contain putative _cell type annotations_ as assigned in [Patel _et al._ (2022)](https://doi.org/10.1016/j.devcel.2022.04.003).
    We will end up leveraging these cell type annotations to explore how successful our integration is; after integration, we expect cell types from different samples to group together, rather than being separated by batches.
  • This code would make more sense with length(), not head()
    ```{r shared genes}
    # Define vector of shared genes
    shared_genes <- sce_list |>
    # get rownames (genes) for each SCE in sce_list
    purrr::map(rownames) |>
    # reduce to the _intersection_ among lists
    purrr::reduce(intersect)
    ```
    ```{r print shared genes, live = TRUE}
    # Use head to look at the vector of shared genes:
    head(shared_genes)
    ```
  • We should probably use sce or similar, not x, in these spots to emphasize that it's nice that you can have informative "loop variables" with this new(-ish) syntax
    ```{r compare rowdata, live = TRUE}
    # Use `purrr::map()` to quickly extract rowData column names for all SCEs
    purrr::map(sce_list,
    \(x) colnames(rowData(x)))
    ```

    ```{r compare coldata}
    purrr::map(sce_list,
    \(x) colnames(colData(x)) )
    ```
  • This code would make more sense with table(), not unique()
    # What are the unique values in the `sample` column?
    unique( colData(merged_sce)$sample )

@jaclyn-taroni
Copy link
Member

jaclyn-taroni commented Dec 11, 2024

The DE dimension reduction plots by cell type:

```{r celltype UMAP}
# UMAP of all samples labeled by cell type
scater::plotReducedDim(integrated_sce,
dimred = "fastmnn_UMAP",
# color each point by cell type
color_by = "celltype_broad",
point_size= 0.5,
point_alpha = 0.4)
```

Could probably use a tweak to the legend along the lines of what we have in the integration notebook:

guides(color = guide_legend(override.aes = list(size = 3, alpha = 1))) + # Modify the legend key with larger, easier to see points

To make the cell type colors easier to see when projected, etc.

@sjspielman
Copy link
Member Author

This is not in fact the "last thing" we do in this notebook, but rather the "next thing":

The last thing that we will do is take a look at how many genes are significant.
Here we will want to use the adjusted p-value, found in the `padj` column of the results, as this accounts for multiple test correction.

@jashapiro
Copy link
Member

Add set.seed() to setup in pathway analysis notebook

@jashapiro
Copy link
Member

We don't use marker genes anymore, but differential expression analysis results:

# We'll use the marker genes as GSEA input

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants