Merge pull request #50 from gustaveroussy/dev

Dev
gustaveroussy · Apr 8, 2024 · acabcc7 · acabcc7
2 parents 97597d1 + 02f2109
commit acabcc7
Show file tree

Hide file tree

Showing 41 changed files with 1,249 additions and 739 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,15 @@
+## [1.0.10] - 2024-04-08
+
+### Added
+- CosMX reader with image stitching (experimental)
+
+### Changed
+- Default `min_transcripts` set in snakemake configs
+- Minimum number of transcripts per patch set to 4000 (#41)
+- Config files refactoring (configs added or renamed)
+- Readers refactoring
+- Section with error during report are not displayed (instead of throwing an error)
+
 ## [1.0.9] - 2024-04-03
 
 ### Added:

diff --git a/docs/api/io.md b/docs/api/io.md
@@ -35,3 +35,11 @@
 ::: sopa.io.wsi
     options:
       show_root_heading: true
+
+::: sopa.io.uniform
+    options:
+      show_root_heading: true
+
+::: sopa.io.blobs
+    options:
+      show_root_heading: true
diff --git a/docs/api/utils/data.md b/docs/api/utils/data.md
diff --git a/docs/faq.md b/docs/faq.md
@@ -13,14 +13,20 @@ In this documentation, `data_path` denotes the path to your raw data. Select the
 === "MERSCOPE"
     `data_path` is the "region" directory containing a `detected_transcripts.csv` file and an `image` directory. For instance, the directory can be called `region_0`.
 === "CosMX"
-    (The CosMX data requires stitching the FOVs. It will be added soon, see [this issue](https://github.com/gustaveroussy/sopa/issues/5))
+    `data_path` is the directory containing (i) the transcript file (ending with `_tx_file.csv` or `_tx_file.csv.gz`), (ii) the FOV locations file, and (iii) a `Morphology2D` directory containing the images.
 === "MACSima"
     `data_path` is the directory containing multiple `.ome.tif` files (one file per channel)
 === "PhenoCycler"
     `data_path` corresponds to the path to one `.qptiff` file, or one `.tif` file (if exported from QuPath)
 === "Hyperion"
     `data_path` is the directory containing multiple `.ome.tiff` files (one file per channel)
 
+## I have small artifact cells, how do remove them?
+
+You may have small cells that were segmented but that should be removed. For that, `Sopa` offers three filtering approaches: using their area, their transcript count, or their fluorescence intensity. Refer to the following config parameters from this [example config](https://github.com/gustaveroussy/sopa/blob/master/workflow/config/example_commented.yaml): `min_area`, `min_transcripts`, and `min_intensity_ratio`.
+
+If using the CLI, `--min-area` can be provided to `sopa segmentation cellpose` or `sopa resolve baysor`, and `--min-transcripts`/`--min-intensity-ratio` can be provided to `sopa aggregate`.
+
 ## Cellpose is not segmenting enough cells; what should I do?
 
 - The main Cellpose parameter to check is `diameter`, i.e. a typical cell diameter **in pixels**. Note that this is highly specific to the technology you're using since the micron-to-pixel ratio can differ. We advise you to start with the default parameter for your technology of interest (see the `diameter` parameter inside our config files [here](https://github.com/gustaveroussy/sopa/tree/master/workflow/config)).

diff --git a/docs/tutorials/api_usage.ipynb b/docs/tutorials/api_usage.ipynb
@@ -6,7 +6,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from sopa.utils.data import uniform\n",
     "import sopa.segmentation\n",
     "import sopa.io"
    ]
@@ -60,9 +59,9 @@
    ],
    "source": [
     "# The line below creates a toy dataset for this tutorial\n",
-    "# Instead, use sopa.io to read your own data as a SpatialData object: see https://gustaveroussy.github.io/sopa/api/io/\n",
-    "# For instance, if you have MERSCOPE data, you can do `sdata = sopa.io.merscope(\"/path/to/region_0\")`\n",
-    "sdata = uniform()\n",
+    "# To load your own data, such as MERSCOPE data, you can do `sdata = sopa.io.merscope(\"/path/to/region_0\")`\n",
+    "# For more details, see https://gustaveroussy.github.io/sopa/api/io/\n",
+    "sdata = sopa.io.uniform()\n",
     "\n",
     "sdata.write(\"tuto.zarr\", overwrite=True)\n",
     "sdata"

diff --git a/docs/tutorials/cli_usage.md b/docs/tutorials/cli_usage.md
@@ -7,7 +7,7 @@ Here, we provide a minimal example of command line usage. For more details and t
 
 For this tutorial, we use a generated dataset. You can expect a total runtime of a few minutes.
 
-The command below will generate and save it on disk (you can change the path `tuto.zarr` to save it somewhere else). If you want to load your own data: choose the right panel below, or see the [`sopa read` CLI documentation](`../../cli/#sopa-read`).
+The command below will generate and save it on disk (you can change the path `tuto.zarr` to save it somewhere else). If you want to load your own data: choose the right panel below. For more information, refer to this [FAQ](../../faq/#what-kind-of-inputs-do-i-need-to-run-sopa) describing which data input you need, or see the [`sopa read`](`../../cli/#sopa-read`) documentation.
 
 === "Tutorial"
     ```sh
@@ -29,9 +29,6 @@ The command below will generate and save it on disk (you can change the path `tu
     # it will generate a '/path/to/sample/directory.zarr' directory
     sopa read /path/to/sample/directory --technology cosmx
     ```
-
-    !!! warning
-        The CosMX data requires stitching the FOVs. It will be added soon, see [this issue](https://github.com/gustaveroussy/sopa/issues/5).
 === "PhenoCycler"
     ```sh
     # it will generate a '/path/to/sample.zarr' directory
@@ -149,7 +146,7 @@ For this tutorial, we will use the config below. Save this in a `config.toml` fi
 ```toml
 [data]
 force_2d = true
-min_molecules_per_cell = 10   # min number of transcripts per cell
+min_molecules_per_cell = 10
 x = "x"
 y = "y"
 z = "z"
@@ -229,11 +226,11 @@ This **mandatory** step turns the data into an `AnnData` object. We can count th
 
 === "Count transcripts + average intensities"
     ```sh
-    sopa aggregate tuto.zarr --gene-column genes --average-intensities
+    sopa aggregate tuto.zarr --gene-column genes --average-intensities --min-transcripts 10
     ```
 === "Count transcripts"
     ```sh
-    sopa aggregate tuto.zarr --gene-column genes
+    sopa aggregate tuto.zarr --gene-column genes --min-transcripts 10
     ```
 === "Average intensities"
     ```sh

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -37,7 +37,6 @@ nav:
       - sopa.annotation.tangram: api/annotation/tangram.md
       - sopa.annotation.fluorescence: api/annotation/fluorescence.md
     - sopa.utils:
-      - sopa.utils.data: api/utils/data.md
       - sopa.utils.image: api/utils/image.md
       - sopa.utils.polygon_crop: api/utils/polygon_crop.md
     - sopa.embedding: api/embedding.md