Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalization and gene selection by analytical Pearson residuals #1715

Merged
merged 106 commits into from
Mar 29, 2022
Merged
Show file tree
Hide file tree
Changes from 103 commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
12e751d
adding core functions and documentation for pearson residual normaliz…
jlause Mar 2, 2021
5d57961
adding Pearson residual+PCA bundles, minor bug fixes
jlause Mar 3, 2021
fced3f2
some style cleanup, minor fixes
jlause Mar 3, 2021
977b6cf
adapting _normalize_pearson_residuals() to cleaned-up _normalized_tot…
jlause Mar 5, 2021
d8d724c
updating layer management as in #1667 for _highly_variable_pearson_re…
jlause Mar 5, 2021
e23ea6c
slight performance improvement for sparse input
jlause Mar 5, 2021
fc49c25
style cleanup
jlause Mar 10, 2021
f91f2fe
fixing import issue, fixing docstring style, adding check_values para…
jlause Mar 12, 2021
60de21d
fixed small NameError, simplified clip argument
jlause Mar 12, 2021
1f86989
remove pd.categorical()
jlause Mar 12, 2021
95ec0e5
adding check_values to docstrings and remaining pearson residual func…
jlause Mar 12, 2021
ff82290
np.empty instead of np.nan
jlause Mar 12, 2021
f7f7dbd
add references to docstrings, add HVG details to docstring
jlause Mar 15, 2021
af0a825
exposing pca keyword arguments to the user for the bundle/recipe func…
jlause Mar 15, 2021
142eaca
removed unneeded reversal in hvg, fix kwargs_pca bug, consistent defa…
jlause Mar 15, 2021
541b252
fixing handling of `inplace` and `subset` arguments (see issue #1886)…
jlause Jun 11, 2021
fdd500b
renaming output fields for consistency, fixing minor bug
jlause Jun 11, 2021
c6dfc1d
renaming output fields for consistency
jlause Jun 11, 2021
dc27c9f
adding function that prepares testdata (used for pearson residual tests)
jlause Jun 11, 2021
aef44d8
adding tests for all pearson residual functions
jlause Jun 11, 2021
e76cf7b
fix precommit high_var_genes
giovp Jun 28, 2021
bdb7ce2
try to get precommit to work
giovp Jun 28, 2021
65edcf3
Merge branch 'master' into pearson_residuals_1.7
giovp Jun 28, 2021
6cea040
try to get precommit to work
giovp Jun 28, 2021
d7e63f7
fix recipes
giovp Jun 28, 2021
0b5a02b
fix normalization
giovp Jun 28, 2021
6779d23
remove relative imports
giovp Jun 28, 2021
237e7cd
fix docstrings
giovp Jun 28, 2021
d75aa36
retry to build docs
giovp Jun 28, 2021
301190a
Merge branch 'master' into pearson_residuals_1.7
giovp Jun 29, 2021
293b47d
fix highvar docstring
giovp Jun 29, 2021
a61496b
more fixing docstrings
giovp Jun 29, 2021
7afb94f
docs build locally ? :hammer:
giovp Jun 29, 2021
e3e5045
minor cleanup test normalization
giovp Jul 5, 2021
e368b57
more minor cleanups
giovp Jul 5, 2021
bfbd484
final cleanup normalization
giovp Jul 5, 2021
a55e677
fixes high var
giovp Jul 5, 2021
4f47c11
init experimental module
giovp Jul 5, 2021
c32eafc
fix column ordering for batch case
jlause Jul 14, 2021
f6d4286
moving to experimental, minor fix for experimental version of hvg sel…
jlause Jul 14, 2021
dd16140
linking tests to new experimental submodule, style cleanup
jlause Jul 14, 2021
a19f90e
adapt input arguments and docstring for experimental version of hvg s…
jlause Jul 14, 2021
db0d9a5
Merge branch 'master' into pearson_residuals_1.7
giovp Aug 1, 2021
2c93996
Merge branch 'pearson_residuals_1.7' of github.com:jlause/scanpy into…
giovp Aug 1, 2021
659da16
add recipes
giovp Aug 1, 2021
bf0bb8e
fix docs
giovp Aug 1, 2021
191c449
add correct module docs
giovp Aug 1, 2021
7f3d6ed
fix recipe docstrings
giovp Aug 1, 2021
87bf425
try fix indentation
giovp Aug 1, 2021
0b8ba5f
fix indentation
giovp Aug 1, 2021
88bf93a
fix
giovp Aug 1, 2021
ef81b72
new indentation
giovp Aug 1, 2021
900c12c
add space
giovp Aug 2, 2021
b00a0b6
fixing typo in docstring
jlause Aug 2, 2021
617aff1
renaming pca output fields
jlause Aug 2, 2021
4dabfcd
adapting tests to new output fieldname
jlause Aug 2, 2021
58ac8e0
fix docs :hammer:
giovp Aug 6, 2021
8ae8338
update docs
giovp Aug 6, 2021
535129c
fix test :hammer:
giovp Aug 6, 2021
3addbe7
ensure argument and docstring consistency
jlause Aug 10, 2021
9215983
update citation year
jlause Aug 10, 2021
37695a9
cleaning imports in `preprocessing` functions
jlause Aug 17, 2021
f42f4b8
making inputcheck tests specific to error/warning messages
jlause Aug 17, 2021
1e20c3b
making inputcheck tests specific to error/warning messages
jlause Aug 17, 2021
1f02e2c
resolve HVGs across batches more cleanly, fix dtype issue
jlause Aug 20, 2021
0add1b7
renaming pca input arguments
jlause Aug 20, 2021
2a2b98a
renaming pca input arguments
jlause Aug 20, 2021
0150057
_pca bundle: more efficient copy handling, added input check. both _p…
jlause Aug 20, 2021
e9c0b89
move repeated inputcheck code to helpers
jlause Aug 22, 2021
3e02b05
merging tests *_values and *_general
jlause Aug 23, 2021
720578d
condense code in pearson hvg selection test, smaller test data for sp…
jlause Aug 23, 2021
83b7338
condensing code in normalization tests
jlause Aug 23, 2021
a616419
add asteriks for keyword
giovp Aug 31, 2021
62660a2
updating refs to Genome Biology publication
jlause Sep 14, 2021
02b091a
Merge branch 'pearson_residuals_1.7' of github.com:jlause/scanpy into…
jlause Sep 14, 2021
75c8fcc
Merge branch 'master' into pearson_residuals_1.7
ivirshup Oct 26, 2021
b5cb3aa
cleanup helpers.py
jlause Dec 24, 2021
aa9037f
cleanup main files as requested by @ivirshup
jlause Dec 24, 2021
e972daf
revert unneeded settingWithCopy fix
jlause Dec 24, 2021
47bd877
cache data
giovp Feb 23, 2022
13a44be
use doc_params for doc
giovp Feb 23, 2022
0e4711d
fix doc_params var
giovp Feb 23, 2022
14f207f
Merge branch 'master' into pearson_residuals_1.7
giovp Feb 24, 2022
aa55183
finalize docs
giovp Feb 24, 2022
8e9b07b
fix param doc
giovp Feb 24, 2022
dce90b2
wrong var still
giovp Feb 24, 2022
ca65af5
add cached datasets module and test on high_var_genes tests
giovp Feb 28, 2022
d3a07cb
use new cache dataset module for tests
giovp Feb 28, 2022
1ebea68
Merge branch 'master' into pearson_residuals_1.7
giovp Feb 28, 2022
bdd37cd
fix precommit
giovp Feb 28, 2022
aba3906
fix docs
giovp Feb 28, 2022
c9dbf48
fix reference and add notebook to tutorials
giovp Mar 9, 2022
e335966
add release note
giovp Mar 9, 2022
bf7fb25
add release note
giovp Mar 9, 2022
1045d98
fix release note
giovp Mar 9, 2022
f7d4c49
typo
giovp Mar 9, 2022
5f76cdf
remove duplicate reference
giovp Mar 9, 2022
19b018c
fixing black flake etc requirements
jlause Mar 12, 2022
ce9ee43
add _pca function to release note
jlause Mar 12, 2022
7ffdec3
last edits to docs
jlause Mar 13, 2022
a0aaf96
fix release and tutorial image
giovp Mar 16, 2022
e7e92a1
Merge branch 'master' into pearson_residuals_1.7
giovp Mar 16, 2022
ad81e29
try fix pre-commit
giovp Mar 16, 2022
d74a0e6
minor docs
giovp Mar 16, 2022
92d675c
Merge branch 'master' into pearson_residuals_1.7
ivirshup Mar 29, 2022
970b0fa
Remove accidentally included files from merge
ivirshup Mar 29, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,24 @@ Collections of useful measurements for evaluating results.
metrics.morans_i


Experimental
------------

.. module:: scanpy.experimental
.. currentmodule:: scanpy

New methods that are in early development which are not (yet)
integrated in Scanpy core.

.. autosummary::
:toctree: generated/

experimental.pp.normalize_pearson_residuals
experimental.pp.normalize_pearson_residuals_pca
experimental.pp.highly_variable_genes
experimental.pp.recipe_pearson_residuals


Classes
-------

Expand Down
4 changes: 4 additions & 0 deletions docs/references.rst
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,10 @@ References
*Laplacian Dynamics and Multiscale Modular Structure in Networks*
`arXiv <https://arxiv.org/abs/0812.1770>`__.

.. [Lause21] Lause *et al.* (2021)
*Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data*,
`Genome Biology <https://doi.org/10.1186/s13059-021-02451-7>`__.

.. [Leek12] Leek *et al.* (2012),
*sva: Surrogate Variable Analysis. R package*
`Bioconductor <https://doi.org/10.18129/B9.bioc.sva>`__.
Expand Down
10 changes: 10 additions & 0 deletions docs/release-notes/1.9.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,13 @@
- :func:`~scanpy.tl.filter_rank_genes_groups` now allows to filter with absolute values of log fold change :pr:`1649` :smaller:`S Rybakov`
- :func:`~scanpy.pl.embedding_density` now allows more than 10 groups :pr:`1936` :smaller:`A Wolf`
- :func:`~scanpy.logging.print_versions` now uses `session_info` :pr:`2089` :smaller:`P Angerer` :smaller:`I Virshup`

.. rubric:: Experimental module

- Added :mod:`scanpy.experimental` module!

- Added :func:`scanpy.experimental.pp.normalize_pearson_residuals` for Pearson Residuals normalization :pr:`1715` :smaller:`J Lause, G Palla, I Virshup`
- Added :func:`scanpy.experimental.pp.normalize_pearson_residuals_pca` for Pearson Residuals normalization and PCA :pr:`1715` :smaller:`J Lause, G Palla, I Virshup`
- Added :func:`scanpy.experimental.pp.highly_variable_genes` for HVG selection with Pearson Residuals :pr:`1715` :smaller:`J Lause, G Palla, I Virshup`
- Added :func:`scanpy.experimental.pp.normalize_pearson_residuals_pca` for Pearson Residuals normalization and dimensionality reduction with PCA :pr:`1715` :smaller:`J Lause, G Palla, I Virshup`
- Added :func:`scanpy.experimental.pp.recipe_pearson_residuals` for Pearson Residuals normalization, HVG selection and dimensionality reduction with PCA :pr:`1715` :smaller:`J Lause, G Palla, I Virshup`
6 changes: 5 additions & 1 deletion docs/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,11 +97,15 @@ See the `cell cycle`_ notebook.

.. _cell cycle: https://nbviewer.jupyter.org/github/theislab/scanpy_usage/blob/master/180209_cell_cycle/cell_cycle.ipynb

Normalization with Pearson Residuals
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Normalization of scRNA-seq data with Pearson Residuals, from [Lause21]_: :tutorial:`tutorial_pearson_residuals`

.. image:: _static/img/tutorials/170522_visualizing_one_million_cells/tsne_1.3M.png
:width: 120px
:align: right


Scaling Computations
~~~~~~~~~~~~~~~~~~~~

Expand Down
2 changes: 1 addition & 1 deletion scanpy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
from . import tools as tl
from . import preprocessing as pp
from . import plotting as pl
from . import datasets, logging, queries, external, get, metrics
from . import datasets, logging, queries, external, get, metrics, experimental

from anndata import AnnData, concat
from anndata import (
Expand Down
1 change: 1 addition & 0 deletions scanpy/experimental/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from . import pp
79 changes: 79 additions & 0 deletions scanpy/experimental/_docs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
"""Shared docstrings for experimental function parameters.
"""

doc_adata = """\
adata
The annotated data matrix of shape `n_obs` × `n_vars`.
Rows correspond to cells and columns to genes.
"""

doc_dist_params = """\
theta
The negative binomial overdispersion parameter `theta` for Pearson residuals.
Higher values correspond to less overdispersion \
(`var = mean + mean^2/theta`), and `theta=np.Inf` corresponds to a Poisson model.
clip
Determines if and how residuals are clipped:

* If `None`, residuals are clipped to the interval \
`[-sqrt(n_obs), sqrt(n_obs)]`, where `n_obs` is the number of cells in the dataset (default behavior).
* If any scalar `c`, residuals are clipped to the interval `[-c, c]`. Set \
`clip=np.Inf` for no clipping.
"""

doc_check_values = """\
check_values
If `True`, checks if counts in selected layer are integers as expected by this
function, and return a warning if non-integers are found. Otherwise, proceed
without checking. Setting this to `False` can speed up code for large datasets.
"""

doc_layer = """\
layer
Layer to use as input instead of `X`. If `None`, `X` is used.
"""

doc_subset = """\
subset
Inplace subset to highly-variable genes if `True` otherwise merely indicate
highly variable genes.
"""

doc_genes_batch_chunk = """\
n_top_genes
Number of highly-variable genes to keep. Mandatory if `flavor='seurat_v3'` or
`flavor='pearson_residuals'`.
batch_key
If specified, highly-variable genes are selected within each batch separately
and merged. This simple process avoids the selection of batch-specific genes
and acts as a lightweight batch correction method. Genes are first sorted by
how many batches they are a HVG. If `flavor='pearson_residuals'`, ties are
broken by the median rank (across batches) based on within-batch residual
variance.
chunksize
If `flavor='pearson_residuals'`, this dertermines how many genes are processed at
once while computing the residual variance. Choosing a smaller value will reduce
the required memory.
"""

doc_pca_chunk = """\
n_comps
Number of principal components to compute in the PCA step.
random_state
Random seed for setting the initial states for the optimization in the PCA step.
kwargs_pca
Dictionary of further keyword arguments passed on to `scanpy.pp.pca()`.
"""

doc_inplace = """\
inplace
If `True`, update `adata` with results. Otherwise, return results. See below for
details of what is returned.
"""

doc_copy = """\
copy
If `True`, the function runs on a copy of the input object and returns the
modified copy. Otherwise, the input object is modified direcly. Not compatible
with `inplace=False`.
"""
8 changes: 8 additions & 0 deletions scanpy/experimental/pp/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
from scanpy.experimental.pp._normalization import (
normalize_pearson_residuals,
normalize_pearson_residuals_pca,
)

from scanpy.experimental.pp._highly_variable_genes import highly_variable_genes

from scanpy.experimental.pp._recipes import recipe_pearson_residuals
Loading