You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am writing this to preserve for the record a problem that is not yet solved, but which I want to preserve some details discovered in trying to fix the problem. The problem is that pwstack runs in serial mode, but when run with dask it aborts with this error message:
Traceback (most recent call last):
File "/home/pavlis/data/copy_pwmigtest/runpwstack.py", line 28, in <module>
pwstack_return = pwstack(db,pf,slowness_grid_tag='Slowness_Grid_Definition',
File "/home/pavlis/.local/lib/python3.8/site-packages/pwmigpy/pwmig/pwstack.py", line 432, in pwstack
mybag.compute()
File "/home/pavlis/anaconda3/lib/python3.8/site-packages/dask/base.py", line 288, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/home/pavlis/anaconda3/lib/python3.8/site-packages/dask/base.py", line 570, in compute
results = schedule(dsk, keys, **kwargs)
File "/home/pavlis/anaconda3/lib/python3.8/site-packages/dask/multiprocessing.py", line 219, in get
result = get_async(
File "/home/pavlis/anaconda3/lib/python3.8/site-packages/dask/local.py", line 505, in get_async
fire_tasks(chunksize)
File "/home/pavlis/anaconda3/lib/python3.8/site-packages/dask/local.py", line 487, in fire_tasks
dumps((dsk[key], data)),
File "/home/pavlis/anaconda3/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/home/pavlis/anaconda3/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 563, in dump
return Pickler.dump(self, obj)
TypeError: cannot pickle 'PyCapsule' object
From what I glean from the web a PyCapsule is pythons version of an opaque pointer (commonly void * in C). The pwstack code that is creating this error is this:
allqueries=list()
for sid in source_id_list:
for rids in staids:
# build_wfquery returns a dict with lat, lon,
# i, j, and a query string That is a simple
# construct so don't think it will be a bottleneck.
# May have been better done with a dataframe
#q=dask.delayed(build_wfquery)(sid,rids)
q=build_wfquery(sid,rids)
# debug
#print(q['ix1'],q['ix2'],q['fold'])
allqueries.append(q)
mybag=dask.bag.from_sequence(allqueries)
Noting that allqueries that is passed to from_sequence is nothing more than a list of mongodb query strings that are known to work from the serial implementation. Unless I'm reading the debug data incorrectly that is the data dask is handling when the error is thrown.
The error is occuring in this tiny function cloudpickle_fast.py:
def dumps(obj, protocol=None, buffer_callback=None):
"""Serialize obj as a string of bytes allocated in memory
protocol defaults to cloudpickle.DEFAULT_PROTOCOL which is an alias to
pickle.HIGHEST_PROTOCOL. This setting favors maximum communication
speed between processes running the same Python version.
Set protocol=pickle.DEFAULT_PROTOCOL instead if you need to ensure
compatibility with older versions of Python.
"""
with io.BytesIO() as file:
cp = CloudPickler(
file, protocol=protocol, buffer_callback=buffer_callback
)
cp.dump(obj)
return file.getvalue()
in the call cp.dump(obj). Here is output showing some of the content of obj:
I am writing this to preserve for the record a problem that is not yet solved, but which I want to preserve some details discovered in trying to fix the problem. The problem is that pwstack runs in serial mode, but when run with dask it aborts with this error message:
From what I glean from the web a PyCapsule is pythons version of an opaque pointer (commonly void * in C). The pwstack code that is creating this error is this:
Noting that allqueries that is passed to from_sequence is nothing more than a list of mongodb query strings that are known to work from the serial implementation. Unless I'm reading the debug data incorrectly that is the data dask is handling when the error is thrown.
The error is occuring in this tiny function cloudpickle_fast.py:
in the call
cp.dump(obj)
. Here is output showing some of the content of obj:which is called from here in local.py:
Noting that using spyder I can see that
data
is a tuple of the query dict commands created from allqueries.I do not know what is causing this error, but this at least preserves some of the data for discussion.
The text was updated successfully, but these errors were encountered: