Parallel debugging strategies #493
Replies: 4 comments
-
Small addendum to this. This wouldn't be an issue were it not for a fundamental limitation of the debug capabilities of juypter lab. As far as I can tell it won't allow stepping into code you import but only allows you to inspect things inside the notebook. In a parallel workflow that is totally useless because of delayed/lazy computation. Almost all the action is outside the notebook. |
Beta Was this translation helpful? Give feedback.
-
I don't think there is a better way of debugging in such cases. For apptainer on HPC, I think copy modules to local dir to debug is already the best option. In general, we should avoid debugging inside containers as the isolated environment becomes the barrier. I mentioned this in the meeting this week, but maybe we should think of a local deployment method instead of relying on container for everything. |
Beta Was this translation helpful? Give feedback.
-
So, if that is the only alternative for now I'll need to describe that approach in the "debugging strategies" (or something like that) section I have a draft of in the revised user's manual for release 2.0. @wangyinz how difficult would it be to build a container that doesn't use jupyter notebook at all but would still launch MongoDB and enable dask and spark. You could do a lot with pdb and ipython outside the jupyter environment that would make problems like this a lot easier to solve. Although jupyter has great merit as a tool for reproducibility for many people is it one more thing to learn. I've already seen that issue arise. If there were a clean way to run just a straight python script of a workflow it would be much easier to run pdb. We could even produce the C++ code in the dev container with symbol tables so at least experienced developers could use gdb to debug C++ related problems. This particular problem I'm seeing is actually being thrown by a C++ function, but I don't think it is the problem. It is just being fed something wrong and I can't track it inside the parallel workflow due to lazy computations. In fact, come to think of it can I just use the dev container and use it to the workflow with pdb? I would still need to do the hacking by copying some modules to the working directory, but that would solve the issues with the jupyter debugger. Is that right? |
Beta Was this translation helpful? Give feedback.
-
The current containers should work just fine for that. Although it has jupyter launched at the beginning, you may just open a terminal tab inside the jupyter and then run gdb or ipython there. It is never required to use jupyter in our containers. With that said, we could modify our startup script to have different launch modes. It should be easy to add that. |
Beta Was this translation helpful? Give feedback.
-
This could have been done through email, but I thought preserving the responses I hope I get can be of help to users.
I'm stuck on a debug problem that I'm unsure how to get around. The background is this:
Now I've debugged things like this before by inserting print statements in the python code. When I'm using docker a nice way to do that and not corrupt anything is to edit the python code in the /opt/conda/lib tree for mspass adding print statements where I suspect issues. The problem, however, is I'm working on a HPC system running the container with apptainer (aka singularity). With apptainer I do not have write permission to edit the content of mspass python files under /opt/conda. The only solution I can find to this is to copy modules I need to my working directory, edit those to add the print statements needed, and then hack the imports to get the local module. That can get ugly fast if, as in this case, the source of the error is not clear. Do any of you have any other ideas how to deal with this problem?
Beta Was this translation helpful? Give feedback.
All reactions