Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error when runing pyscenic on jupyter using #199

Closed
honghh2018 opened this issue Aug 12, 2020 · 7 comments
Closed

error when runing pyscenic on jupyter using #199

honghh2018 opened this issue Aug 12, 2020 · 7 comments
Labels
results Question about pySCENIC results

Comments

@honghh2018
Copy link

The error showed up when runing pyscenic on step one using jupyter through python 3.8.2 version.
the detail posted below,
#Version
scanpy==1.4.4.post1 anndata==0.6.22.post1 umap==0.4.3 numpy==1.17.4 scipy==1.4.1 pandas==0.25.3 scikit-learn==0.23.1 statsmodels==0.11.1 python-igraph==0.8.2,pyscenic==0.10.0,seaborn==0.10.1,dask==2.17.2
#Input matirx
the input matrix was transposed into column with gene symbol and row with cell identifier.
#pre-process
def process_gse103322(fname):
# Load CSV file
mtx = pd.read_csv(fname, sep='\t', index_col=0, skiprows=[1,2,3,4,5])

# Extract gene symbol
mtx.index = list(map(lambda g: g[1:-1], mtx.index))

# Remove duplicate gene symbols (keep last when sorted ascending order according to row sum)
mtx = mtx.iloc[mtx.sum(axis=1).argsort()]
mtx = mtx[~mtx.index.duplicated(keep='last')]

return mtx

df_mtx = process_gse103322(ALL_FNAME)
df_mtx.to_csv(EXP_MTX_FNAME, index=True)

adata = sc.AnnData(X=df_mtx.T.sort_index())
df_obs = df_metadata.set_index('cell_id').sort_index()
adata.obs = df_obs
adata.var_names_make_unique()
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
adata.raw = adata #Store non-log transformed data as raw. This data can be used via the use_raw parameters available for many functions.
sc.pp.log1p(adata)
adata
the imput matrix like below,
image

#The jypyter command,
!pyscenic grn {EXP_MTX_QC_FNAME} {HUMAN_TFS_FNAME} -o {ADJACENCIES_FNAME} --num_workers 32
The error detail,
2020-08-12 10:15:00,885 - pyscenic.cli.pyscenic - INFO - Loading expression matrix.

2020-08-12 10:16:26,107 - pyscenic.cli.pyscenic - INFO - Inferring regulatory networks.
distributed.protocol.core - CRITICAL - Failed to deserialize
Traceback (most recent call last):
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/protocol/core.py", line 101, in loads
msg = loads_msgpack(small_header, small_payload)
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/protocol/core.py", line 193, in loads_msgpack
return msgpack.loads(payload, use_list=False, **msgpack_opts)
File "msgpack/_unpacker.pyx", line 161, in msgpack._unpacker.unpackb
TypeError: unpackb() got an unexpected keyword argument 'strict_map_key'
distributed.core - ERROR - unpackb() got an unexpected keyword argument 'strict_map_key'
Traceback (most recent call last):
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/core.py", line 346, in handle_comm
msg = await comm.read()
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/comm/tcp.py", line 211, in read
msg = await from_frames(
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/comm/utils.py", line 75, in from_frames
res = _from_frames()
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/comm/utils.py", line 60, in _from_frames
return protocol.loads(
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/protocol/core.py", line 101, in loads
msg = loads_msgpack(small_header, small_payload)
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/protocol/core.py", line 193, in loads_msgpack
return msgpack.loads(payload, use_list=False, **msgpack_opts)
File "msgpack/_unpacker.pyx", line 161, in msgpack._unpacker.unpackb
TypeError: unpackb() got an unexpected keyword argument 'strict_map_key'
distributed.protocol.core - CRITICAL - Failed to deserialize
Traceback (most recent call last):
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/protocol/core.py", line 101, in loads
msg = loads_msgpack(small_header, small_payload)
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/protocol/core.py", line 193, in loads_msgpack
return msgpack.loads(payload, use_list=False, **msgpack_opts)
File "msgpack/_unpacker.pyx", line 161, in msgpack._unpacker.unpackb
TypeError: unpackb() got an unexpected keyword argument 'strict_map_key'
Traceback (most recent call last):
File "/home/YXBio/miniconda3/envs/ScCancer/bin/pyscenic", line 8, in
sys.exit(main())
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/pyscenic/cli/pyscenic.py", line 420, in main
args.func(args)
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/pyscenic/cli/pyscenic.py", line 69, in find_adjacencies_command
client, shutdown_callback = _prepare_client(args.client_or_address, num_workers=args.num_workers)
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/pyscenic/prune.py", line 62, in _prepare_client
local_cluster = LocalCluster(n_workers=num_workers,
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/deploy/local.py", line 204, in init
super(LocalCluster, self).init(
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/deploy/spec.py", line 256, in init
self.sync(self._start)
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/deploy/cluster.py", line 160, in sync
return sync(self.loop, func, *args, **kwargs)
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/utils.py", line 348, in sync
raise exc.with_traceback(tb)
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/utils.py", line 332, in f
result[0] = yield future
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/tornado/gen.py", line 735, in run
value = future.result()
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/deploy/spec.py", line 289, in _start
await super()._start()
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/deploy/cluster.py", line 59, in _start
self.scheduler_info = await comm.read()
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/comm/tcp.py", line 211, in read
msg = await from_frames(
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/comm/utils.py", line 75, in from_frames
res = _from_frames()
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/comm/utils.py", line 60, in _from_frames
return protocol.loads(
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/protocol/core.py", line 101, in loads
msg = loads_msgpack(small_header, small_payload)
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/protocol/core.py", line 193, in loads_msgpack
return msgpack.loads(payload, use_list=False, **msgpack_opts)
File "msgpack/_unpacker.pyx", line 161, in msgpack._unpacker.unpackb
TypeError: unpackb() got an unexpected keyword argument 'strict_map_key'
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/utils.py", line 200, in ignoring
yield
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/deploy/spec.py", line 607, in close_clusters
cluster.close(timeout=10)
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/deploy/cluster.py", line 81, in close
return self.sync(self._close, callback_timeout=timeout)
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/deploy/cluster.py", line 160, in sync
return sync(self.loop, func, *args, **kwargs)
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/utils.py", line 348, in sync
raise exc.with_traceback(tb)
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/utils.py", line 332, in f
result[0] = yield future
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/tornado/gen.py", line 735, in run
value = future.result()
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/deploy/spec.py", line 380, in _close
self.scale(0)
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/deploy/spec.py", line 444, in scale
v["name"] for v in self.scheduler_info["workers"].values()
KeyError: 'workers'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/contextlib.py", line 131, in exit
self.gen.throw(type, value, traceback)
File "/home/YXBio/miniconda3/envs/ScCancer/lib/python3.8/site-packages/distributed/utils.py", line 201, in ignoring
except exceptions as e:
TypeError: catching classes that do not inherit from BaseException is not allowed

Any advice would be appreciated
Best,
hanhuihong

@honghh2018 honghh2018 added the results Question about pySCENIC results label Aug 12, 2020
@cflerin
Copy link
Contributor

cflerin commented Aug 17, 2020

Hi @honghh2018 ,

A few suggestions:

  • Upgrade to the latest pySCENIC version (currently 0.10.3)
  • Try the alternate GRN inference method (see here and Possible solutions for GRNBoost2/GENIE3 Dask issues #163, although you might have issues using this with Python 3.8 (gene_names in arboreto is not defined #183).
  • Reinstall pySCENIC in a clean conda environment (you have Dask 2.17.2, which should be 1.0.0 according to the pySCENIC requirements).
  • Use the Scanpy steps for preprocessing instead of process_gse103322 function (I can't tell what this function does).

@honghh2018
Copy link
Author

Hi @cflerin ,
Thanks for the suggestion, it work for me.
I encounter another issue, when runing on derive_regulons function,
the motif generated from pyscenic ctx step and the format like below,
osteoblasts.motifs.csv
so i used the df_motifs = load_motifs(MOTIFS_FNAME) code to load the motif file, and the format showing below,

image
But the error sparked from running derive_regulons function, error posted below,
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
----> 1 regulons = derive_regulons(df_motifs)

in derive_regulons(motifs, db_names)
4
5 print(motifs.columns)
----> 6 motifs.columns = motifs.columns.droplevel(0)
7 print(motifs.columns)
8

/home/sy/miniconda3/envs/jupyter20200827/lib/python3.7/site-packages/pandas/core/indexes/base.py in droplevel(self, level)
1655 "Cannot remove {} levels from an index with {} "
1656 "levels: at least one level must be "
-> 1657 "left.".format(len(level), self.nlevels)
1658 )
1659 # The two checks above guarantee that here self is a MultiIndex

ValueError: Cannot remove 1 levels from an index with 1 levels: at least one level must be left.

how can i fix this issue?
Any advice would be appreciated.
Regards,
hanhuihong

@honghh2018
Copy link
Author

honghh2018 commented Aug 28, 2020

Hi @cflerin ,
The error that i posted above had been done. however, i got another issue with filter motif into regulons.
I using the function that you posted on the jupyter like below
the function:
def derive_regulons(motifs, db_names=('hg19-tss-centered-10kb-10species',
'hg19-500bp-upstream-10species',
'hg19-tss-centered-5kb-10species')):
motifs.columns = motifs.columns.droplevel(0)

def contains(*elems):
    def f(context):
        return any(elem in context for elem in elems)
    return f

# For the creation of regulons we only keep the 10-species databases and the activating modules. We also remove the
# enriched motifs for the modules that were created using the method 'weight>50.0%' (because these modules are not part
# of the default settings of modules_from_adjacencies anymore.
motifs = motifs[
    np.fromiter(map(compose(op.not_, contains('weight>50.0%')), motifs.Context), dtype=np.bool) & \
    np.fromiter(map(contains(*db_names), motifs.Context), dtype=np.bool) & \
    np.fromiter(map(contains('activating'), motifs.Context), dtype=np.bool)]

# We build regulons only using enriched motifs with a NES of 3.0 or higher; we take only directly annotated TFs or TF annotated
# for an orthologous gene into account; and we only keep regulons with at least 10 genes.
regulons = list(filter(lambda r: len(r) >= 10, df2regulons(motifs[(motifs['NES'] >= 3.0) 
                                                                  & ((motifs['Annotation'] == 'gene is directly annotated')
                                                                    | (motifs['Annotation'].str.startswith('gene is orthologous to')
                                                                       & motifs['Annotation'].str.endswith('which is directly annotated for motif')))
                                                                 ])))

# Rename regulons, i.e. remove suffix.
return list(map(lambda r: r.rename(r.transcription_factor), regulons))

the error taking on picture underneath
image
Additionally, i wonder that how can i transform the motif.tsv files into regulon files? had any function that can do?
Best,
hanhuihong

@cflerin
Copy link
Contributor

cflerin commented Aug 28, 2020

Hi @honghh2018 ,

I guess you're following the examples in this notebook. It's not strictly necessary to run this derive_regulons function, this was an example to show some specific filters applied to the regulons in this dataset. If you're using a new dataset (and not trying to replicate the results from the notebook), I would recommend skipping this step on a first pass and going on to AUCell. You could then come back and include some refinements later on if necessary.

But to answer your question, I would guess that something went wrong when loading your regulons (generated with pyscenic ctx ...). You could try load_signatures, which is a little more robust:

from pyscenic.cli.utils import load_signatures
sig = load_signatures('reg.csv')

This should give you a list of regulons (and this should be equivalent to df2regulons(load_motifs('reg.csv'))).

@cflerin cflerin closed this as completed Aug 28, 2020
@honghh2018
Copy link
Author

honghh2018 commented Aug 31, 2020

Hi @cflerin ,
Appreciated the answer, it great help for the problem solved.
But another problem make me confused,
It is that How can i choose the matrix ? normalized data or raw counts for pyscenic grn and ctx inputed instead?
And i wonder that how to remove the batch effect at more than two samples merged into one matrix on the pyscenic analysis? and should the batch effect be removed?
Any advice would be appreciated.
Best,
hanhuihong

@cflerin
Copy link
Contributor

cflerin commented Sep 1, 2020

Hi @honghh2018 ,

You can use just about any processing method for the expression matrix. I usually use raw counts (including for runs with multiple samples) but normalized and batch-corrected data is valid as well.

@hyjforesight
Copy link

hyjforesight commented Feb 19, 2022

if

regulons = derive_regulons(df_motifs)
# ValueError: Cannot remove 1 levels from an index with 1 levels: at least one level must be left.
# AssertionError: signatures dataframe is empty

add

from pyscenic.cli.utils import load_signatures
regulons = load_signatures(MOTIFS_FNAME)

after regulons = derive_regulons(df_motifs) and before

# Pickle these regulons.
with open(REGULONS_DAT_FNAME, 'wb') as f:
    pickle.dump(regulons, f)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
results Question about pySCENIC results
Projects
None yet
Development

No branches or pull requests

3 participants