-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aries dangling long-running process when used through the UP #133
Comments
Hi Nicola, when used through UP, there is a process for aries (of the executable you mentioned) started for each planning request.
It is extremely simple: it creates the process on creation and kills it when garbage collected. I am on linux and I have been very pleasantly surprised by how reliable this has been. It don't remember seeing the process escape the watch recently (and on my machine that's is easy to spot as the fans would go crazy after a few minutes of such CPU load). That is typically the kind of setting where small difference between operating system that may cause subtle bug. Having no access to macos machines I cannot really test it or directly help: / Just to be clear is it something that happens on regular basis ? Or happened just once ? |
Hi @arbimo, thanks for the quick answer! It only happened once in the last few days, I'll double check in the future to see if it happens again. I've looked at the code and indeed is quite straightforward. I have three possible solutions in mind:
The number 2 seems to me the easiest and most effective (it's only an additional option to Let me know if you need anything. |
I've checked the doc better, the option to give to the I can prepare a PR if you are interested. |
I think (but am unsure) that things are a bit more complex than that. Essentially what we want to enforce is that the created subprocess may not outlive the parent one. This is something I did not find a proper way to enforce in a cross-platform way (which was really surprising to me). What you propose is a bit different I think. Creating a process group would allow sending signals to the group. This does not mean that all signals are shared with the child. E.g. if a SIGKILL/SIGTERM is send to the parent process, it would not be sent to the planner (unless you explicitly send it to the group). I think that Also I don't think SIGINT is our main problem, as it does appear to be handled properly (otherwise you would have a zombie process for any interrupted invocation of the planner). Perhaps it would be good if the problem could be better characterized and scoped as it is still a bit unlcear now under which circumstances this may happen. |
Yes, I agree, the process group was not a good idea. Looking around there are some Linux-specific solutions (e.g. https://stackoverflow.com/a/36945270/3206471), but I cannot find anything about macOS, let alone Windows. Maybe some more proactive approach is needed, where the two processes exchange some beacons periodically and if the parent dies and does not reply to the beacons anymore, the child quits itself. |
A better handling on the server side could go quite a long way, namely:
|
I apology for a basic question, but I have to understand better your architecture: is this client/server part specific to the UP integration or is it exposed by Aries generally? |
It is specific to the UP integration. The However, even though the interface is defined in the UP, all the client and server code (notably the way to launch the server and connect to it) is entirely done on the |
Cool. So this means you have complete control of the client code as well, which is on the UP side, right? |
Btw, digging deeper in this problem scares me a bit about the complexity of some details of Unix behavior... See here for someone claiming to have a portable solution to the problem: https://groups.google.com/g/comp.unix.programmer/c/CVATHnIVNv0 |
I'm using the Aries engine of the UP framework and to test what I'm developing I kill my code most of the time (Ctrl-C, so
SIGTERM
, not anything too violent).In the last days I noticed my laptop (Apple MBP 14" M1 Pro) was draining the battery very quickly and strangely heating up quite a lot and I was worried about it having some hardware problems when instead I noticed there was a running
up-aries_macos_arm64
process in the background using up to 300% of CPU constantly.So this was probably dangling from some of the instances of Aries launched by the UP during testing of my code, but which was not killed properly when the Python process stopped.
Unfortunately I tried but I don't know how to reproduce what happened. I'm issuing it anyway because it may be something easy to spot for somebody that knows the source and how the service process is supposed to be killed in anomalous situations.
The Python code itself is also difficult to post, but it is encoding some high-level problem into a hierarchical problem and then launching Aries with an
AnytimePlanner
instance in a rather standard way.Edit: The Python packages versions I'm using are
unified-planning==1.0.0.289.dev1
andup-aries==0.3.2
.Let me know if I can do anything to help you debug this.
The text was updated successfully, but these errors were encountered: