'Series' object has no attribute 'nonzero' in `get_pseudobulk` #166

maltekuehl · 2025-01-28T17:46:03Z

Describe the bug
Calling get_pseudobulk results in an error: AttributeError: 'Series' object has no attribute 'nonzero'

To Reproduce
I run decoupler as part of a pseudobulk analysis with Snakemake and conda environments. My pipeline used to work, but some unfixed dependency of one of the packages in this environment seems to have broken decoupler after reinstalling the pipeline.

Environment:

channels:
  - conda-forge
  - bioconda
dependencies:
  - conda-forge::anndata = 0.11.3
  - conda-forge::decoupler-py = 1.8.0
  - conda-forge::matplotlib = 3.9.1
  - conda-forge::pandas = 2.1.1
  - bioconda::pydeseq2 = 0.4.10
  - conda-forge::seaborn = 0.13.2

Code:

"""Perform differential gene expression analysis using pydeseq2."""

import anndata as ad
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import scanpy as sc
import decoupler as dc
from pydeseq2.dds import DeseqDataSet
from pydeseq2.default_inference import DefaultInference
from pydeseq2.ds import DeseqStats
from snakemake.script import snakemake

adata = ad.read_h5ad(snakemake.input["counts"])
adata_raw = adata.raw.to_adata()
adata_raw.layers["counts"] = adata_raw.X
label_key = snakemake.params["label_key"]

print(adata_raw)

pdata = dc.get_pseudobulk(
    adata_raw,
    sample_col="patient",
    groups_col=label_key,
    layer="counts",
    mode="sum",
    min_cells=0,
    min_counts=0,
)

I have confirmed that the object does indeed have the counts layer.

Error trace:

Traceback (most recent call last):
  File "/home/jovyan/work/.snakemake/scripts/tmpb0323egv.differential_gene_expression.py", line 24, in <module>
    pdata = dc.get_pseudobulk(
            ^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/147963e4f2a433ccb39a372f832106ba_/lib/python3.12/site-packages/decoupler/utils_anndata.py", line 381, in get_pseudobulk
    psbulk, ncells, counts, props = compute_psbulk(n_rows, n_cols, X, sample_col, groups_col, smples, groups, obs,
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/147963e4f2a433ccb39a372f832106ba_/lib/python3.12/site-packages/decoupler/utils_anndata.py", line 263, in compute_psbulk
    profile = X[(obs[sample_col] == smp) & (obs[groups_col] == grp)]
              ~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/147963e4f2a433ccb39a372f832106ba_/lib/python3.12/site-packages/scipy/sparse/_index.py", line 30, in __getitem__
    index, new_shape = self._validate_indices(key)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/147963e4f2a433ccb39a372f832106ba_/lib/python3.12/site-packages/scipy/sparse/_index.py", line 269, in _validate_indices
    index.extend(ix.nonzero())
                 ^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/147963e4f2a433ccb39a372f832106ba_/lib/python3.12/site-packages/pandas/core/generic.py", line 6204, in __getattr__
    return object.__getattribute__(self, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Series' object has no attribute 'nonzero'

I have tried fixing various versions of numpy and pandas in the environment by hand, without success. Could you please provide me with a combination of numpy and pandas that is known to work in conjunction with decoupler?

Expected behavior
No error message, a pseudobulk AnnData object and clearly fixed requirements that prevent other dependency changes from breaking decoupler code.

System

OS: Ubuntu 22.04, running the jupyter/datascience-notebook:latest container with podman
Python version: 3.12.8
Versions of libraries involved: See environment list.

Additional context
I also tried downgrading scipy to 1.14.1, which solves the error above but results in a different error:

Traceback (most recent call last):
  File "/home/jovyan/work/.snakemake/scripts/tmpmmo33mnn.differential_gene_expression.py", line 25, in <module>
    pdata = dc.get_pseudobulk(
            ^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/73cef712095d88cddf34f35b0b1cc46c_/lib/python3.12/site-packages/decoupler/utils_anndata.py", line 381, in get_pseudobulk
    psbulk, ncells, counts, props = compute_psbulk(n_rows, n_cols, X, sample_col, groups_col, smples, groups, obs,
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/73cef712095d88cddf34f35b0b1cc46c_/lib/python3.12/site-packages/decoupler/utils_anndata.py", line 260, in compute_psbulk
    new_obs.loc[index, :] = tmp
    ~~~~~~~~~~~^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/73cef712095d88cddf34f35b0b1cc46c_/lib/python3.12/site-packages/pandas/core/indexing.py", line 885, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "/home/jovyan/work/.snakemake/conda/73cef712095d88cddf34f35b0b1cc46c_/lib/python3.12/site-packages/pandas/core/indexing.py", line 1895, in _setitem_with_indexer
    self._setitem_single_block(indexer, value, name)
  File "/home/jovyan/work/.snakemake/conda/73cef712095d88cddf34f35b0b1cc46c_/lib/python3.12/site-packages/pandas/core/indexing.py", line 2138, in _setitem_single_block
    self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/73cef712095d88cddf34f35b0b1cc46c_/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 399, in setitem
    return self.apply("setitem", indexer=indexer, value=value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/73cef712095d88cddf34f35b0b1cc46c_/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 354, in apply
    applied = getattr(b, f)(**kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/work/.snakemake/conda/73cef712095d88cddf34f35b0b1cc46c_/lib/python3.12/site-packages/pandas/core/internals/blocks.py", line 1182, in setitem
    values[indexer] = casted
    ~~~~~~^^^^^^^^^
ValueError: could not broadcast input array from shape (2,8) into shape (8,)

Update: I broke it down even further, just conda-forge::decoupler-py = 1.8.0 with no other dependencies results in this error, too.

Update 2: One problem is, that extract_psbulk_inputs can return X with different data types, e.g., my counts layer was a csr_matrix, leading to the error. When manually converting this to a numpy array, I do not get the nonzero error but the could not broadcast error instead.

The text was updated successfully, but these errors were encountered:

maltekuehl added the bug Something isn't working label Jan 28, 2025

maltekuehl linked a pull request Jan 28, 2025 that will close this issue

Update utils_anndata.py to fix csr_matrix subsetting incompatibility #167

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'Series' object has no attribute 'nonzero' in `get_pseudobulk` #166

'Series' object has no attribute 'nonzero' in `get_pseudobulk` #166

maltekuehl commented Jan 28, 2025 •

edited

Loading

'Series' object has no attribute 'nonzero' in get_pseudobulk #166

'Series' object has no attribute 'nonzero' in get_pseudobulk #166

Comments

maltekuehl commented Jan 28, 2025 • edited Loading

'Series' object has no attribute 'nonzero' in `get_pseudobulk` #166

'Series' object has no attribute 'nonzero' in `get_pseudobulk` #166

maltekuehl commented Jan 28, 2025 •

edited

Loading