Parallel debugging strategies #493

pavlis · 2024-01-19T11:25:07Z

pavlis
Jan 19, 2024
Collaborator

This could have been done through email, but I thought preserving the responses I hope I get can be of help to users.

I'm stuck on a debug problem that I'm unsure how to get around. The background is this:

The workflow I have is absolutely known to work serial. The exact same data with the exact same sequence of functions work in a serial version.
When I run the parallel version I get a mysterious error about dmatrix indexing. The detail of the error is not the point though. The issue is figuring out where it is coming from as the exception stack left in the notebook gives no hint.

Now I've debugged things like this before by inserting print statements in the python code. When I'm using docker a nice way to do that and not corrupt anything is to edit the python code in the /opt/conda/lib tree for mspass adding print statements where I suspect issues. The problem, however, is I'm working on a HPC system running the container with apptainer (aka singularity). With apptainer I do not have write permission to edit the content of mspass python files under /opt/conda. The only solution I can find to this is to copy modules I need to my working directory, edit those to add the print statements needed, and then hack the imports to get the local module. That can get ugly fast if, as in this case, the source of the error is not clear. Do any of you have any other ideas how to deal with this problem?

pavlis · 2024-01-19T11:35:18Z

pavlis
Jan 19, 2024
Collaborator Author

Small addendum to this. This wouldn't be an issue were it not for a fundamental limitation of the debug capabilities of juypter lab. As far as I can tell it won't allow stepping into code you import but only allows you to inspect things inside the notebook. In a parallel workflow that is totally useless because of delayed/lazy computation. Almost all the action is outside the notebook.

0 replies

wangyinz · 2024-01-19T15:54:32Z

wangyinz
Jan 19, 2024
Maintainer

I don't think there is a better way of debugging in such cases. For apptainer on HPC, I think copy modules to local dir to debug is already the best option. In general, we should avoid debugging inside containers as the isolated environment becomes the barrier. I mentioned this in the meeting this week, but maybe we should think of a local deployment method instead of relying on container for everything.

0 replies

pavlis · 2024-01-19T19:09:58Z

pavlis
Jan 19, 2024
Collaborator Author

So, if that is the only alternative for now I'll need to describe that approach in the "debugging strategies" (or something like that) section I have a draft of in the revised user's manual for release 2.0.

@wangyinz how difficult would it be to build a container that doesn't use jupyter notebook at all but would still launch MongoDB and enable dask and spark. You could do a lot with pdb and ipython outside the jupyter environment that would make problems like this a lot easier to solve. Although jupyter has great merit as a tool for reproducibility for many people is it one more thing to learn. I've already seen that issue arise. If there were a clean way to run just a straight python script of a workflow it would be much easier to run pdb. We could even produce the C++ code in the dev container with symbol tables so at least experienced developers could use gdb to debug C++ related problems. This particular problem I'm seeing is actually being thrown by a C++ function, but I don't think it is the problem. It is just being fed something wrong and I can't track it inside the parallel workflow due to lazy computations.

In fact, come to think of it can I just use the dev container and use it to the workflow with pdb? I would still need to do the hacking by copying some modules to the working directory, but that would solve the issues with the jupyter debugger. Is that right?

0 replies

wangyinz · 2024-01-19T23:01:52Z

wangyinz
Jan 19, 2024
Maintainer

The current containers should work just fine for that. Although it has jupyter launched at the beginning, you may just open a terminal tab inside the jupyter and then run gdb or ipython there. It is never required to use jupyter in our containers. With that said, we could modify our startup script to have different launch modes. It should be easy to add that.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel debugging strategies #493

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Parallel debugging strategies #493

pavlis Jan 19, 2024 Collaborator

Replies: 4 comments

pavlis Jan 19, 2024 Collaborator Author

wangyinz Jan 19, 2024 Maintainer

pavlis Jan 19, 2024 Collaborator Author

wangyinz Jan 19, 2024 Maintainer

pavlis
Jan 19, 2024
Collaborator

pavlis
Jan 19, 2024
Collaborator Author

wangyinz
Jan 19, 2024
Maintainer

pavlis
Jan 19, 2024
Collaborator Author

wangyinz
Jan 19, 2024
Maintainer