Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Series' object has no attribute 'nonzero' in get_pseudobulk #166

Open
maltekuehl opened this issue Jan 28, 2025 · 0 comments · May be fixed by #167
Open

'Series' object has no attribute 'nonzero' in get_pseudobulk #166

maltekuehl opened this issue Jan 28, 2025 · 0 comments · May be fixed by #167
Labels
bug Something isn't working

Comments

@maltekuehl
Copy link

maltekuehl commented Jan 28, 2025

Describe the bug
Calling get_pseudobulk results in an error: AttributeError: 'Series' object has no attribute 'nonzero'

To Reproduce
I run decoupler as part of a pseudobulk analysis with Snakemake and conda environments. My pipeline used to work, but some unfixed dependency of one of the packages in this environment seems to have broken decoupler after reinstalling the pipeline.

Environment:

channels:
  - conda-forge
  - bioconda
dependencies:
  - conda-forge::anndata = 0.11.3
  - conda-forge::decoupler-py = 1.8.0
  - conda-forge::matplotlib = 3.9.1
  - conda-forge::pandas = 2.1.1
  - bioconda::pydeseq2 = 0.4.10
  - conda-forge::seaborn = 0.13.2

Code:

"""Perform differential gene expression analysis using pydeseq2."""

import anndata as ad
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import scanpy as sc
import decoupler as dc
from pydeseq2.dds import DeseqDataSet
from pydeseq2.default_inference import DefaultInference
from pydeseq2.ds import DeseqStats
from snakemake.script import snakemake

adata = ad.read_h5ad(snakemake.input["counts"])
adata_raw = adata.raw.to_adata()
adata_raw.layers["counts"] = adata_raw.X
label_key = snakemake.params["label_key"]

print(adata_raw)

pdata = dc.get_pseudobulk(
    adata_raw,
    sample_col="patient",
    groups_col=label_key,
    layer="counts",
    mode="sum",
    min_cells=0,
    min_counts=0,
)

I have confirmed that the object does indeed have the counts layer.

Error trace:

Traceback (most recent call last):
  File "/home/jovyan/work/.snakemake/scripts/tmpb0323egv.differential_gene_expression.py", line 24, in <module>
    pdata = dc.get_pseudobulk(
            ^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/147963e4f2a433ccb39a372f832106ba_/lib/python3.12/site-packages/decoupler/utils_anndata.py", line 381, in get_pseudobulk
    psbulk, ncells, counts, props = compute_psbulk(n_rows, n_cols, X, sample_col, groups_col, smples, groups, obs,
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/147963e4f2a433ccb39a372f832106ba_/lib/python3.12/site-packages/decoupler/utils_anndata.py", line 263, in compute_psbulk
    profile = X[(obs[sample_col] == smp) & (obs[groups_col] == grp)]
              ~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/147963e4f2a433ccb39a372f832106ba_/lib/python3.12/site-packages/scipy/sparse/_index.py", line 30, in __getitem__
    index, new_shape = self._validate_indices(key)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/147963e4f2a433ccb39a372f832106ba_/lib/python3.12/site-packages/scipy/sparse/_index.py", line 269, in _validate_indices
    index.extend(ix.nonzero())
                 ^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/147963e4f2a433ccb39a372f832106ba_/lib/python3.12/site-packages/pandas/core/generic.py", line 6204, in __getattr__
    return object.__getattribute__(self, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Series' object has no attribute 'nonzero'

I have tried fixing various versions of numpy and pandas in the environment by hand, without success. Could you please provide me with a combination of numpy and pandas that is known to work in conjunction with decoupler?

Expected behavior
No error message, a pseudobulk AnnData object and clearly fixed requirements that prevent other dependency changes from breaking decoupler code.

System

  • OS: Ubuntu 22.04, running the jupyter/datascience-notebook:latest container with podman
  • Python version: 3.12.8
  • Versions of libraries involved: See environment list.

Additional context
I also tried downgrading scipy to 1.14.1, which solves the error above but results in a different error:

Traceback (most recent call last):
  File "/home/jovyan/work/.snakemake/scripts/tmpmmo33mnn.differential_gene_expression.py", line 25, in <module>
    pdata = dc.get_pseudobulk(
            ^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/73cef712095d88cddf34f35b0b1cc46c_/lib/python3.12/site-packages/decoupler/utils_anndata.py", line 381, in get_pseudobulk
    psbulk, ncells, counts, props = compute_psbulk(n_rows, n_cols, X, sample_col, groups_col, smples, groups, obs,
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/73cef712095d88cddf34f35b0b1cc46c_/lib/python3.12/site-packages/decoupler/utils_anndata.py", line 260, in compute_psbulk
    new_obs.loc[index, :] = tmp
    ~~~~~~~~~~~^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/73cef712095d88cddf34f35b0b1cc46c_/lib/python3.12/site-packages/pandas/core/indexing.py", line 885, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "/home/jovyan/work/.snakemake/conda/73cef712095d88cddf34f35b0b1cc46c_/lib/python3.12/site-packages/pandas/core/indexing.py", line 1895, in _setitem_with_indexer
    self._setitem_single_block(indexer, value, name)
  File "/home/jovyan/work/.snakemake/conda/73cef712095d88cddf34f35b0b1cc46c_/lib/python3.12/site-packages/pandas/core/indexing.py", line 2138, in _setitem_single_block
    self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/73cef712095d88cddf34f35b0b1cc46c_/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 399, in setitem
    return self.apply("setitem", indexer=indexer, value=value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/73cef712095d88cddf34f35b0b1cc46c_/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 354, in apply
    applied = getattr(b, f)(**kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/73cef712095d88cddf34f35b0b1cc46c_/lib/python3.12/site-packages/pandas/core/internals/blocks.py", line 1182, in setitem
    values[indexer] = casted
    ~~~~~~^^^^^^^^^
ValueError: could not broadcast input array from shape (2,8) into shape (8,)

Update: I broke it down even further, just conda-forge::decoupler-py = 1.8.0 with no other dependencies results in this error, too.

Update 2: One problem is, that extract_psbulk_inputs can return X with different data types, e.g., my counts layer was a csr_matrix, leading to the error. When manually converting this to a numpy array, I do not get the nonzero error but the could not broadcast error instead.

@maltekuehl maltekuehl added the bug Something isn't working label Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant