Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyCapsule error from cloudpickle #2

Open
pavlis opened this issue Jan 16, 2022 · 0 comments
Open

PyCapsule error from cloudpickle #2

pavlis opened this issue Jan 16, 2022 · 0 comments

Comments

@pavlis
Copy link
Owner

pavlis commented Jan 16, 2022

I am writing this to preserve for the record a problem that is not yet solved, but which I want to preserve some details discovered in trying to fix the problem. The problem is that pwstack runs in serial mode, but when run with dask it aborts with this error message:

Traceback (most recent call last):

  File "/home/pavlis/data/copy_pwmigtest/runpwstack.py", line 28, in <module>
    pwstack_return = pwstack(db,pf,slowness_grid_tag='Slowness_Grid_Definition',

  File "/home/pavlis/.local/lib/python3.8/site-packages/pwmigpy/pwmig/pwstack.py", line 432, in pwstack
    mybag.compute()

  File "/home/pavlis/anaconda3/lib/python3.8/site-packages/dask/base.py", line 288, in compute
    (result,) = compute(self, traverse=False, **kwargs)

  File "/home/pavlis/anaconda3/lib/python3.8/site-packages/dask/base.py", line 570, in compute
    results = schedule(dsk, keys, **kwargs)

  File "/home/pavlis/anaconda3/lib/python3.8/site-packages/dask/multiprocessing.py", line 219, in get
    result = get_async(

  File "/home/pavlis/anaconda3/lib/python3.8/site-packages/dask/local.py", line 505, in get_async
    fire_tasks(chunksize)

  File "/home/pavlis/anaconda3/lib/python3.8/site-packages/dask/local.py", line 487, in fire_tasks
    dumps((dsk[key], data)),

  File "/home/pavlis/anaconda3/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)

  File "/home/pavlis/anaconda3/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)

TypeError: cannot pickle 'PyCapsule' object

From what I glean from the web a PyCapsule is pythons version of an opaque pointer (commonly void * in C). The pwstack code that is creating this error is this:

    allqueries=list()
    for sid in source_id_list:
        for rids in staids:
            # build_wfquery returns a dict with lat, lon,
            # i, j, and a query string  That is a simple
            # construct so don't think it will be a bottleneck.
            # May have been better done with a dataframe
            #q=dask.delayed(build_wfquery)(sid,rids)
            q=build_wfquery(sid,rids)
            # debug
            #print(q['ix1'],q['ix2'],q['fold'])
            allqueries.append(q)
    mybag=dask.bag.from_sequence(allqueries)

Noting that allqueries that is passed to from_sequence is nothing more than a list of mongodb query strings that are known to work from the serial implementation. Unless I'm reading the debug data incorrectly that is the data dask is handling when the error is thrown.

The error is occuring in this tiny function cloudpickle_fast.py:

    def dumps(obj, protocol=None, buffer_callback=None):
        """Serialize obj as a string of bytes allocated in memory

        protocol defaults to cloudpickle.DEFAULT_PROTOCOL which is an alias to
        pickle.HIGHEST_PROTOCOL. This setting favors maximum communication
        speed between processes running the same Python version.

        Set protocol=pickle.DEFAULT_PROTOCOL instead if you need to ensure
        compatibility with older versions of Python.
        """
        with io.BytesIO() as file:
            cp = CloudPickler(
                file, protocol=protocol, buffer_callback=buffer_callback
            )
            cp.dump(obj)
            return file.getvalue()

in the call cp.dump(obj). Here is output showing some of the content of obj:

which is called from here in local.py:

                    args.append(
                        (
                            key,
                            dumps((dsk[key], data)),
                            dumps,
                            loads,
                            get_id,
                            pack_exception,
                        )
                    )

Noting that using spyder I can see that data is a tuple of the query dict commands created from allqueries.

IPdb [12]: type(obj)
<class 'tuple'>

IPdb [13]: len(obj)
2

IPdb [15]: print(type(obj[0]),type(obj[1]))
<class 'tuple'> <class 'dict'>

IPdb [18]: for i in range(len(obj[0])):
    print(obj[0][i])

<function reify at 0x7fc0086a68b0>
(<function map_chunk at 0x7fc0086a6ca0>, <function pwstack.<locals>.<lambda> at 0x7fbfed6e2c10>, [(<function map_chunk at 0x7fc0086a6ca0>, <function pwstack.<locals>.<lambda> at 0x7fbfed6e2310>, [(<function map_chunk at 0x7fc0086a6ca0>, <function pwstack.<locals>.<lambda> at 0x7fbfed6e2790>, [('from_sequence-df528f943b159f5def469711bbbf417f', 0)], None, {})], None, {})], None, {})

IPdb [20]: for k in obj[1].keys():
    print(obj[1][k])

[{'idlist': [], 'lat': 26.684617348933415, 'lon': -120.17056382988999, 'ix1': 0, 'ix2': 0, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': []}}, 'fold': 0}, {'idlist': [], 'lat': 26.985231366733434, 'lon': -120.27554915749876, 'ix1': 0, 'ix2': 1, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': []}}, 'fold': 0}, {'idlist': [], 'lat': 27.285767182188337, 'lon': -120.38109688239506, 'ix1': 0, 'ix2': 2, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': []}}, 'fold': 0}, {'idlist': [], 'lat': 27.586223351037315, 'lon': -120.48721724067066, 'ix1': 0, 'ix2': 3, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': []}}, 'fold': 0}, {'idlist': [], 'lat': 27.886598404882246, 'lon': -120.59392063363019, 'ix1': 0, 'ix2': 4, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': []}}, 'fold': 0}, {'idlist': [], 'lat': 28.186890850577715, 'lon': -120.70121763183276, 'ix1': 0, 'ix2': 5, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': []}}, 'fold': 0}, {'idlist': [], 'lat': 28.487099169604786, 'lon': -120.80911897923677, 'ix1': 0, 'ix2': 6, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': []}}, 'fold': 0}, {'idlist': [], 'lat': 28.787221817427742, 'lon': -120.91763559745056, 'ix1': 0, 'ix2': 7, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': []}}, 'fold': 0}, {'idlist': [], 'lat': 29.087257222833454, 'lon': -121.02677859009232, 'ix1': 0, 'ix2': 8, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': []}}, 'fold': 0}, {'idlist': [], 'lat': 29.38720378725275, 'lon': -121.13655924726278, 'ix1': 0, 'ix2': 9, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': []}}, 'fold': 0}, {'idlist': [], 'lat': 29.687059884062982, 'lon': -121.2469890501338, 'ix1': 0, 'ix2': 10, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': []}}, 'fold': 0}, {'idlist': [], 'lat': 29.986823857871677, 'lon': -121.35807967565665, 'ix1': 0, 'ix2': 11, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': []}}, 'fold': 0}, {'idlist': [], 'lat': 30.28649402378003, 'lon': -121.46984300139401, 'ix1': 0, 'ix2': 12, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': []}}, 'fold': 0}, {'idlist': [], 'lat': 30.58606866662599, 'lon': -121.58229111047888, 'ix1': 0, 'ix2': 13, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': []}}, 'fold': 0}, {'idlist': [], 'lat': 30.885546040206176, 'lon': -121.69543629670474, 'ix1': 0, 'ix2': 14, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': []}}, 'fold': 0}, {'idlist': [], 'lat': 31.184924366475716, 'lon': -121.80929106975123, 'ix1': 0, 'ix2': 15, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': []}}, 'fold': 0}, {'idlist': [ObjectId('61c0c17bce6d049b82cc9dd9')], 'lat': 31.484201834725656, 'lon': -121.92386816054906, 'ix1': 0, 'ix2': 16, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': [ObjectId('61c0c17bce6d049b82cc9dd9')]}}, 'fold': 1}, {'idlist': [ObjectId('61c0c17bce6d049b82cc9dd9')], 'lat': 31.78337660073671, 'lon': -122.03918052678902, 'ix1': 0, 'ix2': 17, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': [ObjectId('61c0c17bce6d049b82cc9dd9')]}}, 'fold': 1}, {'idlist': [ObjectId('61c0c17bce6d049b82cc9dd9')], 'lat': 32.08244678590895, 'lon': -122.15524135857922, 'ix1': 0, 'ix2': 18, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': [ObjectId('61c0c17bce6d049b82cc9dd9')]}}, 'fold': 1}, {'idlist': [ObjectId('61c0c17bce6d049b82cc9dd9'), ObjectId('61c0c17bce6d049b82cc9dd3')], 'lat': 32.381410476366355, 'lon': -122.27206408425538, 'ix1': 0, 'ix2': 19, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': [ObjectId('61c0c17bce6d049b82cc9dd9'), ObjectId('61c0c17bce6d049b82cc9dd3')]}}, 'fold': 2}, {'idlist': [ObjectId('61c0c17bce6d049b82cc9dd9'), ObjectId('61c0c17bce6d049b82cc9dd3')], 'lat': 32.680265722035664, 'lon': -122.38966237634922, 'ix1': 0, 'ix2': 20, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': [ObjectId('61c0c17bce6d049b82cc9dd9'), ObjectId('61c0c17bce6d049b82cc9dd3')]}}, 'fold': 2}, {'idlist': [ObjectId('61c0c17bce6d049b82cc9dd9'), ObjectId('61c0c17bce6d049b82cc9dd3')], 'lat': 32.9790105356982, 'lon': -122.50805015771924, 'ix1': 0, 'ix2': 21, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': [ObjectId('61c0c17bce6d049b82cc9dd9'), ObjectId('61c0c17bce6d049b82cc9dd3')]}}, 'fold': 2}, {'idlist': [ObjectId('61c0c17bce6d049b82cc9dd9'), ObjectId('61c0c17bce6d049b82cc9dd3'), ObjectId('61c0c17bce6d049b82cc9dce')], 'lat': 33.27764289201428, 'lon': -122.62724160784991, 'ix1': 0, 'ix2': 22, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': [ObjectId('61c0c17bce6d049b82cc9dd9'), ObjectId('61c0c17bce6d049b82cc9dd3'), ObjectId('61c0c17bce6d049b82cc9dce')]}}, 'fold': 3}, {'idlist': [ObjectId('61c0c17bce6d049b82cc9dd3'), ObjectId('61c0c17bce6d049b82cc9dce'), ObjectId('61c0c17bce6d049b82cc9e0c'), ObjectId('61c0c17bce6d049b82cc9d8c')], 'lat': 33.57616072651876, 'lon': -122.74725116932387, 'ix1': 0, 'ix2': 23, 'query': {'source_id': {'$eq': ObjectId('61c0c10cce6d049b82cc9bd5')}, 'site_id': {'$in': [ObjectId('61c0c17bce6d049b82cc9dd3'), ObjectId('61c0c17bce6d049b82cc9dce'), ObjectId('61c0c17bce6d049b82cc9e0c'), ObjectId('61c0c17bce6d049b82cc9d8c')]}}, 'fold': 4}, {'idlist':

.... line above is larger than that with more entries ...

I do not know what is causing this error, but this at least preserves some of the data for discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant