Merge pull request #16 from gustaveroussy/dev

Dev
gustaveroussy · Jan 25, 2024 · 67c877c · 67c877c
2 parents 3577f9d + a348bc6
commit 67c877c
Show file tree

Hide file tree

Showing 14 changed files with 171 additions and 73 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,9 @@
+## [1.0.x] - tbd
+
+### Added
+- The `phenocycler` reader can now also read `.tif` files (not just `.qptiff`)
+- Added missing legend in the HTML report under the "Channels" section (#15)
+
 ## [1.0.2] - 2024-01-15
 
 ### Fix

diff --git a/README.md b/README.md
@@ -11,7 +11,7 @@
 [![License](https://img.shields.io/pypi/l/sopa.svg)](https://github.com/gustaveroussy/sopa/blob/master/LICENSE)
 [![Imports: isort](https://img.shields.io/badge/imports-isort-blueviolet)](https://pycqa.github.io/isort/)
 
-Built on top of [SpatialData](https://github.com/scverse/spatialdata), Sopa enables processing and analyses of image-based spatial-omics using a standard data structure and output. We currently support the following technologies: Xenium, MERSCOPE, CosMX, PhenoCycler, MACSIMA, Hyperion. Sopa was designed for generability and low memory consumption on large images (scales to `1TB+` images).
+Built on top of [SpatialData](https://github.com/scverse/spatialdata), Sopa enables processing and analyses of image-based spatial-omics using a standard data structure and output. We currently support the following technologies: Xenium, MERSCOPE, CosMX, PhenoCycler, MACSima, Hyperion. Sopa was designed for generability and low memory consumption on large images (scales to `1TB+` images).
 
 The pipeline outputs contain: (i) Xenium Explorer files for interactive visualization, (ii) an HTML report for quick quality controls, and (iii) a SpatialData `.zarr` directory for further analyses.
 

diff --git a/docs/api/io.md b/docs/api/io.md
@@ -24,10 +24,6 @@
     options:
       show_root_heading: true
 
-::: sopa.io.qptiff
-    options:
-      show_root_heading: true
-
 ::: sopa.io.hyperion
     options:
       show_root_heading: true

diff --git a/docs/cli.md b/docs/cli.md
@@ -413,7 +413,7 @@ $ sopa read [OPTIONS] DATA_PATH
 
 **Options**:
 
-* `--technology TEXT`: Name of the technology used to collected the data (`xenium`/`merfish`/`cosmx`/`phenocycler`/`macsima`/`qptiff`/`hyperion`)
+* `--technology TEXT`: Name of the technology used to collected the data (`xenium`/`merfish`/`cosmx`/`phenocycler`/`macsima`/`hyperion`)
 * `--sdata-path TEXT`: Optional path to write the SpatialData object. If not provided, will write to the `{data_path}.zarr` directory
 * `--config-path TEXT`: Path to the snakemake config. This can be useful in order not to provide the `--technology` and the `--kwargs` arguments
 * `--kwargs TEXT`: Dictionary provided to the reader function as kwargs  [default: {}]

diff --git a/docs/faq.md b/docs/faq.md
@@ -6,30 +6,31 @@ You need the raw inputs of your machine, that is:
 
 - Optionally, a file of transcript location, usually a `.csv` or `.parquet` file
 
-Our tutorials use `data_path` to denote the path to your raw data. Select the correct tab below to understand what is the right path to your raw data:
+In this documentation, `data_path` denotes the path to your raw data. Select the correct tab below to understand what is the right path to your raw data:
 
 === "Xenium"
     `data_path` is the directory containing the following files: `morphology.ome.tif` and `transcripts.parquet`
 === "MERSCOPE"
-    `data_path` is the "region" directory containing a `detected_transcripts.csv` file and an `image` directory
+    `data_path` is the "region" directory containing a `detected_transcripts.csv` file and an `image` directory. For instance, the directory can be called `region_0`.
 === "CosMX"
-    (More details coming soon)
+    (The CosMX data requires stitching the FOVs. It will be added soon, see [this issue](https://github.com/gustaveroussy/sopa/issues/5))
 === "MACSima"
     `data_path` is the directory containing multiple `.ome.tif` files (one file per channel)
 === "PhenoCycler"
-    `data_path` corresponds to the path to one `.qptiff` file
+    `data_path` corresponds to the path to one `.qptiff` file, or one `.tif` file (if exported from QuPath)
 === "Hyperion"
     `data_path` is the directory containing multiple `.ome.tiff` files (one file per channel)
 
 ## Cellpose is not segmenting enough cells; what should I do?
 
+- The main Cellpose parameter to check is `diameter`, i.e. a typical cell diameter **in pixels**. Note that this is highly specific to the technology you're using since the micron-to-pixel ratio can differ. We advise you to start with the default parameter for your technology of interest (see the `diameter` parameter inside our config files [here](https://github.com/gustaveroussy/sopa/tree/master/workflow/config)).
 - Maybe `min_area` is too high, and all the cells are filtered because they are smaller than this area. Remind that, when using Cellpose, the areas correspond to pixels^2.
 - This can be due to a low image quality. If the image is too pixelated, consider increasing `gaussian_sigma` (e.g., `2`) under the cellpose parameters of our config. If the image has a low contrast, consider increasing `clip_limit` (e.g., `0.3`). These parameters are detailed in [this example config](https://github.com/gustaveroussy/sopa/blob/master/workflow/config/example_commented.yaml).
 - Consider updating the official Cellpose parameters. In particular, try `cellprob_threshold=-6` and `flow_threshold=2`.
 
 ## Can I use Nextflow instead of Snakemake?
 
-Nextflow is not supported yet, but we are working on it. You can also help re-write our Snakemake pipeline for Nextflow.
+Nextflow is not supported yet, but we are working on it. You can also help re-write our Snakemake pipeline for Nextflow (see issue [#7](https://github.com/gustaveroussy/sopa/issues/7)).
 
 ## I have another issue; how do I fix it?
 

diff --git a/docs/tutorials/api_usage.ipynb b/docs/tutorials/api_usage.ipynb
@@ -19,7 +19,7 @@
     "\n",
     "For this tutorial, we use a generated dataset. The command below will generate and save it on disk (you can change the path `tuto.zarr` to save it somewhere else).\n",
     "\n",
-    "See [here](`../../api/io`) for details on how to use your own technology."
+    "See the commented lines below to load your own data, or see the [`sopa.io` API](../../api/io)."
    ]
   },
   {
@@ -55,9 +55,12 @@
     }
    ],
    "source": [
+    "# The line below creates a toy dataset for this tutorial\n",
+    "# Instead, use sopa.io to read your own data as a SpatialData object: see https://gustaveroussy.github.io/sopa/api/io/\n",
+    "# For instance, if you have MERSCOPE data, you can do `sdata = sopa.io.merscope(\"/path/to/region_0\")`\n",
     "sdata = uniform()\n",
-    "sdata.write(\"tuto.zarr\", overwrite=True)\n",
     "\n",
+    "sdata.write(\"tuto.zarr\", overwrite=True)\n",
     "sdata"
    ]
   },

diff --git a/docs/tutorials/cli_usage.md b/docs/tutorials/cli_usage.md
@@ -2,15 +2,50 @@ Here, we provide a minimal example of command line usage. For more details and t
 
 ## Save the `SpatialData` object
 
-For this tutorial, we use a generated dataset. The command below will generate and save it on disk (you can change the path `tuto.zarr` to save it somewhere else). See [here](`../../cli/#sopa-read`) for details on how to use your own technology.
+For this tutorial, we use a generated dataset. The command below will generate and save it on disk (you can change the path `tuto.zarr` to save it somewhere else). If you want to load your own data: choose the right panel below, or see the [`sopa read` CLI documentation](`../../cli/#sopa-read`).
+
+=== "Tutorial"
+    ```sh
+    # it will generate a 'tuto.zarr' directory
+    sopa read . --sdata-path tuto.zarr --technology uniform
+    ```
+=== "Xenium"
+    ```sh
+    # it will generate a '/path/to/sample/directory.zarr' directory
+    sopa read /path/to/sample/directory --technology xenium
+    ```
+=== "MERSCOPE"
+    ```sh
+    # it will generate a '/path/to/sample/directory.zarr' directory
+    sopa read /path/to/sample/directory --technology merscope
+    ```
+=== "CosMX"
+    ```sh
+    # it will generate a '/path/to/sample/directory.zarr' directory
+    sopa read /path/to/sample/directory --technology cosmx
+    ```
+
+    !!! warning
+        The CosMX data requires stitching the FOVs. It will be added soon, see [this issue](https://github.com/gustaveroussy/sopa/issues/5).
+=== "PhenoCycler"
+    ```sh
+    # it will generate a '/path/to/sample.zarr' directory
+    sopa read /path/to/sample.qptiff --technology phenocycler
+    ```
+=== "MACSima"
+    ```sh
+    # it will generate a '/path/to/sample/directory.zarr' directory
+    sopa read /path/to/sample/directory --technology macsima
+    ```
+=== "Hyperion"
+    ```sh
+    # it will generate a '/path/to/sample/directory.zarr' directory
+    sopa read /path/to/sample/directory --technology hyperion
+    ```
 
-```sh
-# this generates a 'tuto.zarr' directory
-sopa read . --sdata-path tuto.zarr --technology uniform
-```
 
 !!! info
-    This generates a `.zarr` directory corresponding to a [`SpatialData` object](https://github.com/scverse/spatialdata).
+    It has created a `.zarr` directory which stores a [`SpatialData` object](https://github.com/scverse/spatialdata) corresponding to your data sample. You can choose the location of the `.zarr` directory using the `--sdata-path` command line argument.
 
 ## (Optional) ROI selection
 

diff --git a/sopa/annotation/tangram/run.py b/sopa/annotation/tangram/run.py
@@ -158,6 +158,10 @@ def pp_adata(self, ad_sp_: AnnData, ad_sc_: AnnData, split: np.ndarray) -> AnnDa
             set(ad_sp_split.var_names[ad_sp_split.var.counts > 0])
             & set(ad_sc_.var_names[ad_sc_.var.counts > 0])
         )
+
+        assert len(
+            selection
+        ), f"No gene in common between the reference and the spatial adata object. Have you run transcript aggregation?"
         log.info(f"Keeping {len(selection)} shared genes")
 
         for ad_ in [ad_sp_split, ad_sc_]:

diff --git a/sopa/cli/app.py b/sopa/cli/app.py
@@ -48,7 +48,7 @@ def read(
     ),
     technology: str = typer.Option(
         None,
-        help="Name of the technology used to collected the data (`xenium`/`merfish`/`cosmx`/`phenocycler`/`macsima`/`qptiff`/`hyperion`)",
+        help="Name of the technology used to collected the data (`xenium`/`merfish`/`cosmx`/`phenocycler`/`macsima`/`hyperion`)",
     ),
     sdata_path: str = typer.Option(
         None,
@@ -91,7 +91,7 @@ def read(
 
     assert hasattr(
         io, technology
-    ), f"Technology {technology} unknown. Currently available: xenium, merscope, cosmx, phenocycler, hyperion, macsima, qptiff"
+    ), f"Technology {technology} unknown. Currently available: xenium, merscope, cosmx, phenocycler, hyperion, macsima"
 
     sdata = getattr(io, technology)(data_path, **kwargs)
     io.write_standardized(sdata, sdata_path, delete_table=True)

diff --git a/sopa/cli/explorer.py b/sopa/cli/explorer.py
@@ -116,7 +116,7 @@ def add_aligned(
     from sopa.io.explorer.images import align
 
     sdata = spatialdata.read_zarr(sdata_path)
-    image = io.imaging.ome_tif(image_path)
+    image = io.imaging.ome_tif(image_path, as_image=True)
 
     align(
         sdata, image, transformation_matrix_path, overwrite=overwrite, image_key=original_image_key

diff --git a/sopa/io/__init__.py b/sopa/io/__init__.py
@@ -1,4 +1,4 @@
-from .imaging import qptiff, macsima, phenocycler, hyperion, ome_tif
+from .imaging import macsima, phenocycler, hyperion, ome_tif
 from .explorer import write
 from .standardize import write_standardized
 from .transcriptomics import merscope, xenium, cosmx