Skip to content

Commit

Permalink
Merge branch 'master' of github.com:NBISweden/workshop-scRNAseq
Browse files Browse the repository at this point in the history
  • Loading branch information
royfrancis committed Jan 29, 2024
2 parents 084e172 + b9d87e0 commit 0d58017
Show file tree
Hide file tree
Showing 59 changed files with 1,024 additions and 2,570 deletions.
597 changes: 0 additions & 597 deletions compiled/labs/scanpy/scanpy_07_trajectory.ipynb

This file was deleted.

1,257 changes: 0 additions & 1,257 deletions docs/labs/scanpy/scanpy_07_trajectory.html

This file was deleted.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
982 changes: 428 additions & 554 deletions docs/labs/scanpy/scanpy_08_spatial.html

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Diff not rendered.
Diff not rendered.
31 changes: 23 additions & 8 deletions docs/labs/seurat/seurat_07_trajectory.html

Large diffs are not rendered by default.

60 changes: 31 additions & 29 deletions labs/scanpy/scanpy_07_trajectory.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ engine: jupyter
Code chunks run Python commands unless it starts with `%%bash`, in which case, those chunks run shell commands.
:::

Partly following [this tutorial](https://scanpy-tutorials.readthedocs.io/en/latest/paga-paul15.html).
Partly following this PAGA [tutorial](https://scanpy-tutorials.readthedocs.io/en/latest/paga-paul15.html) with some modifications.

## Loading libraries

Expand All @@ -33,30 +33,33 @@ sc.settings.verbosity = 3
sc.settings.set_figure_params(dpi=100, frameon=False, figsize=(5, 5), facecolor='white', color_map = 'viridis_r')
```

## Loading data
## Preparing data

In order to speed up the computations during the exercises, we will be using a subset of a bone marrow dataset (originally containing about
100K cells). The bone marrow is the source of adult immune cells, and contains virtually all differentiation stages of cell from the immune
system which later circulate in the blood to all other organs.
In order to speed up the computations during the exercises, we will be using a subset of a bone marrow dataset (originally containing about 100K cells). The bone marrow is the source of adult immune cells, and contains virtually all differentiation stages of cell from the immune system which later circulate in the blood to all other organs.

![](../figs/hematopoiesis.png)

All the data has been preprocessed with Seurat. The file trajectory_scanpy_filtered.h5ad was converted from the Seurat object
using the SeuratDisk package. For more information on how it was done, have a look at the script: convert_to_h5ad.R in the github repo.
If you have been using the **Seurat**, **Bioconductor** or **Scanpy** toolkits with your own data, you need to reach to the point where can find get:

```{python}
import os
- A dimensionality reduction where to perform the trajectory (for example: PCA, ICA, MNN, harmony, Diffusion Maps, UMAP)
- The cell clustering information (for example: from Louvain, k-means)
- A KNN/SNN graph (this is useful to inspect and sanity-check your trajectories)

path_data = "https://export.uppmax.uu.se/naiss2023-23-3/workshops/workshop-scrnaseq"

path_trajectory = "./data/trajectory"
if not os.path.exists(path_trajectory):
os.makedirs(path_trajectory, exist_ok=True)
```

In this case, all the data has been preprocessed with Seurat with standard pipelines. In addition there was some manual filtering done to remove clusters that are disconnected and cells that are hard to cluster, which can be seen in this [script](https://github.com/NBISweden/workshop-scRNAseq/blob/master/scripts/data_processing/slingshot_preprocessing.Rmd)


The file trajectory_scanpy_filtered.h5ad was converted from the Seurat object using the SeuratDisk package. For more information on how it was done, have a look at the script: [convert_to_h5ad.R](https://github.com/NBISweden/workshop-scRNAseq/blob/master/scripts/data_processing/convert_to_h5ad.R) in the github repo.

You can download the data with the commands:

```{python}
import os
import urllib.request
path_data = "https://export.uppmax.uu.se/naiss2023-23-3/workshops/workshop-scrnaseq"
path_results = "data/trajectory"
if not os.path.exists(path_results):
os.makedirs(path_results, exist_ok=True)
Expand All @@ -68,7 +71,9 @@ if not os.path.exists(path_file):
urllib.request.urlretrieve(file_url, path_file)
```

Check that the variable names are correct.
## Reading data

We already have pre-computed and subsetted the dataset (with 6688 cells and 3585 genes) following the analysis steps in this course. We then saved the objects, so you can use common tools to open and start to work with them (either in R or Python).

```{python}
adata = sc.read_h5ad("data/trajectory/trajectory_seurat_filtered.h5ad")
Expand All @@ -88,9 +93,7 @@ There is a umap and clusters provided with the object, first plot some informati
sc.pl.umap(adata, color = ['clusters','dataset','batches','Phase'],legend_loc = 'on data', legend_fontsize = 'xx-small', ncols = 2)
```

It is crucial that you performing analysis of a dataset understands what is going on, what are the clusters you see in your data and most
importantly How are the clusters related to each other?. Well, let’s explore the data a bit. With the help of this table, write down which
cluster numbers in your dataset express these key markers.
It is crucial that you performing analysis of a dataset understands what is going on, what are the clusters you see in your data and most importantly How are the clusters related to each other?. Well, let’s explore the data a bit. With the help of this table, write down which cluster numbers in your dataset express these key markers.

|Marker |Cell Type|
|--------|----------------------------|
Expand Down Expand Up @@ -185,7 +188,7 @@ sc.pl.paga(adata, color='annot', edge_width_scale = 0.3)

As you can see, we have edges between many clusters that we know are are unrelated, so we may need to clean up the data a bit more.

## Data pre-processing prior trajectory inference
## Filtering graph edges

First, lets explore the graph a bit. So we plot the umap with the graph connections on top.

Expand All @@ -200,7 +203,7 @@ sc.pp.neighbors(adata, n_neighbors=5, use_rep = 'X_harmony_Phase', n_pcs = 30)
sc.pl.umap(adata, edges=True, color = 'annot', legend_loc= 'on data', legend_fontsize= 'xx-small')
```

## Rerun PAGA again on the data
### Rerun PAGA again on the data

```{python}
sc.tl.draw_graph(adata, init_pos='X_umap')
Expand All @@ -212,9 +215,9 @@ sc.tl.paga(adata, groups='annot')
sc.pl.paga(adata, color='annot', edge_width_scale = 0.3)
```

## Recomputing the embedding using PAGA-initialization
## Embedding using PAGA-initialization

The following is just as well possible for a UMAP.
We can now redraw the graph using another starting position from the paga layout. The following is just as well possible for a UMAP.

```{python}
sc.tl.draw_graph(adata, init_pos='paga')
Expand All @@ -234,7 +237,9 @@ sc.pl.paga_compare(
legend_fontsize=12, fontsize=12, frameon=False, edges=True)
```

## Reconstructing gene changes along PAGA paths for a given set of genes
## Gene changes

We can reconstruct gene changes along PAGA paths for a given set of genes

Choose a root cell for diffusion pseudotime. We have 3 progenitor clusters, but cluster 5 seems the most clear.

Expand All @@ -250,8 +255,7 @@ Use the full raw data for visualization.
sc.pl.draw_graph(adata, color=['annot', 'dpt_pseudotime'], legend_loc='on data', legend_fontsize= 'x-small')
```

By looking at the different know lineages and the layout of the graph we define manually some paths to the graph that corresponds to spcific
lineages.
By looking at the different know lineages and the layout of the graph we define manually some paths to the graph that corresponds to specific lineages.

```{python}
# Define paths
Expand Down Expand Up @@ -303,11 +307,9 @@ pl.show()
```

:::{.callout-note title="Discuss"}
As you can see, we can manipulate the trajectory quite a bit by selecting different number of neighbors, components etc. to fit with our
assumptions on the development of these celltypes.
As you can see, we can manipulate the trajectory quite a bit by selecting different number of neighbors, components etc. to fit with our assumptions on the development of these celltypes.

Please explore further how you can tweak the trajectory. For instance, can you create a PAGA trajectory using the orignial umap from Seurat
instead? Hint, you first need to compute the neighbors on the umap.
Please explore further how you can tweak the trajectory. For instance, can you create a PAGA trajectory using the orignial umap from Seurat instead? Hint, you first need to compute the neighbors on the umap.
:::

## Session info
Expand Down
67 changes: 29 additions & 38 deletions labs/scanpy/scanpy_08_spatial.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@ import warnings
warnings.simplefilter(action="ignore", category=Warning)
#sc.logging.print_versions() # gives errror!!
sc.set_figure_params(facecolor="white", figsize=(8, 8))
sc.settings.verbosity = 3
```
Expand Down Expand Up @@ -309,7 +308,7 @@ for i, library in enumerate(
for k, v in clusters_colors.items()
if k in ad.obs.clusters.unique().tolist()
],
legend_loc="on data",
legend_loc=None,
show=False,
ax=axs[i],
)
Expand Down Expand Up @@ -369,6 +368,10 @@ with open('data/spatial/visium/scanpy_spatialde.pkl', 'wb') as file:
```

```{python}
# | results: hide
# | eval: false
# skip for now
import urllib.request
import os
Expand All @@ -382,12 +385,19 @@ if not os.path.exists(path_file):
```

```{python}
# | results: hide
# | eval: false
# skip for now
import pickle
with open('data/spatial/visium/scanpy_spatialde.pkl', 'rb') as file:
results = pickle.load(file)
```

```{python}
# | results: hide
# | eval: false
# skip for now.
# We concatenate the results with the DataFrame of annotations of variables: `adata.var`.
results.index = results["g"]
Expand Down Expand Up @@ -423,10 +433,21 @@ adata_cortex = sc.read_h5ad("data/spatial/visium/allen_cortex.h5ad")
adata_cortex
```

Here is the metadata for the cell annotation:

```{python}
adata_cortex.obs
```

There is an issue with the raw matrix in this object that the gene names are not in the index, so we will put them back in.

```{python}
adata_cortex.raw.var.index = adata_cortex.raw.var._index
adata_cortex.raw.var
```

Then we run the regular pipline with normalization and dimensionality reduction.

```{python}
sc.pp.normalize_total(adata_cortex, target_sum=1e5)
sc.pp.log1p(adata_cortex)
Expand Down Expand Up @@ -485,7 +506,7 @@ adata_anterior_subset = counts_adata[
].copy()
# select also the cortex clusters
adata_anterior_subset = adata_anterior_subset[adata_anterior_subset.obs.clusters.isin(['3','4','6','7']),:]
adata_anterior_subset = adata_anterior_subset[adata_anterior_subset.obs.clusters.isin(['3','5','6']),:]
# plot to check that we have the correct regions
Expand All @@ -509,7 +530,7 @@ Here, we will use deconvolution with Stereoscope implemented in the SCVI-tools p
{{< meta st_deconv_genes_1 >}}

```{python}
sc.tl.rank_genes_groups(adata_cortex, 'subclass', method = "t-test", n_genes=100)
sc.tl.rank_genes_groups(adata_cortex, 'subclass', method = "t-test", n_genes=100, use_raw=False)
sc.pl.rank_genes_groups_dotplot(adata_cortex, n_genes=3)
```

Expand All @@ -529,37 +550,23 @@ deg = np.intersect1d(deg,adata_anterior_subset.var.index).tolist()
print(len(deg))
```

Train the model
### Train the model

First, train the model using scRNAseq data.

Stereoscope requires the data to be in counts, earlier in this tutorial we saved the spatial counts in a separate object counts_adata.

However, the single cell dataset that we dowloaded only has the lognormalized data in the adata.X slot, hence we will have to recalculate the count matrix.
In the single cell data we have the raw counts in the `raw.X` matrix so that one will be used. So here we create a new object with all the correct slots for scVI.

```{python}
# first do exponent and subtract pseudocount
E = np.exp(adata_cortex.X)-1
n = np.sum(E,1)
print(np.min(n), np.max(n))
# all sums to 1.7M
factor = np.mean(n)
nC = np.array(adata_cortex.obs.nCount_RNA) # true number of counts
scaleF = nC/factor
C = E * scaleF[:,None]
C = C.astype("int")
```

```{python}
sc_adata = adata_cortex.copy()
sc_adata.X = C
sc_adata.X = adata_cortex.raw.X.copy()
```

Setup the anndata, the implementation requires the counts matrix to be in the "counts" layer as a copy.

```{python}
# | eval: false
# this chunk has issues and therefore not evaluated
import scvi
# from scvi.data import register_tensor_from_anndata
Expand All @@ -576,8 +583,6 @@ RNAStereoscope.setup_anndata(sc_adata, layer="counts", labels_key="subclass")
```

```{python}
# | eval: false
# this chunk has issues and therefore not evaluated
# the model is saved to a file, so if is slow to run, you can simply reload it from disk by setting train = False
Expand All @@ -592,13 +597,11 @@ else:
print("Loaded RNA model from file!")
```

Predict propritions on the spatial data
### Predict proportions on the spatial data

First create a new st object with the correct genes and counts as a layer.

```{python}
# | eval: false
# this chunk has issues and therefore not evaluated
st_adata = adata_anterior_subset.copy()
Expand All @@ -609,8 +612,6 @@ SpatialStereoscope.setup_anndata(st_adata, layer="counts")
```

```{python}
#| eval: false
# this chunk has issues and therefore not evaluated
train=True
if train:
Expand All @@ -626,8 +627,6 @@ else:
Get the results from the model, also put them in the .obs slot.

```{python}
# | eval: false
# this chunk has issues and therefore not evaluated
st_adata.obsm["deconvolution"] = spatial_model.get_proportions()
Expand All @@ -639,8 +638,6 @@ for ct in st_adata.obsm["deconvolution"].columns:
We are then able to explore how cell types in the scRNA-seq dataset are predicted onto the visium dataset. Let's first visualize the neurons cortical layers.

```{python}
# | eval: false
# this chunk has issues and therefore not evaluated
sc.pl.spatial(
st_adata,
Expand All @@ -655,8 +652,6 @@ sc.pl.spatial(
We can go ahead an visualize astrocytes and oligodendrocytes as well.

```{python}
# | eval: false
# this chunk has issues and therefore not evaluated
sc.pl.spatial(
st_adata, img_key="hires", color=["Oligo", "Astro"], size=1.5, library_id=lib_a
Expand All @@ -666,8 +661,6 @@ sc.pl.spatial(
{{< meta st_2 >}}

```{python}
# | eval: false
# this chunk has issues and therefore not evaluated
sc.pl.violin(st_adata, ["L2/3 IT", "L6 CT","Oligo","Astro"],
jitter=0.4, groupby = 'clusters', rotation= 45)
Expand All @@ -677,8 +670,6 @@ sc.pl.violin(st_adata, ["L2/3 IT", "L6 CT","Oligo","Astro"],
{{< meta st_3 >}}

```{python}
# | eval: false
# this chunk has issues and therefore not evaluated
lib_p = "V1_Mouse_Brain_Sagittal_Posterior"
Expand Down
Loading

0 comments on commit 0d58017

Please sign in to comment.