You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have written a PR including some changes I found necessary for our use case:
We've been evaluating 100,000+ models. We ran all these first on a compute cluster; for reasons we're not quite sure of, some of these failed and needed rerunning - this isn't kuenm_ceval's fault, but it wasn't easy to see in advance whether those models had run correctly or not.
What was difficult with kuenm_ceval was the lack of any debug output, so it wasn't possible to tell whether it was just busy, or had stalled waiting for some unreported model to be valid.
So I added a "silent" option to kuenm_ceval, which is false, outputs what model it is working on, and reports if it is having to wait. I found this totally essential, then I could rerun the models that had previously failed. Possibly this would be relevant to issue #7 and issue #19. After that, I could then call with silent=TRUE to read the output from the cluster log (The sys.sleep(0.1) flushes stdout I think) - and see where/why it had stalled.
I also added a catch for when ku_enm_eval comes through entirely NA, in which case the log plots will fail and our cluster jobs exited prematurely.
Hi,
I have written a PR including some changes I found necessary for our use case:
What was difficult with kuenm_ceval was the lack of any debug output, so it wasn't possible to tell whether it was just busy, or had stalled waiting for some unreported model to be valid.
So I added a "silent" option to kuenm_ceval, which is false, outputs what model it is working on, and reports if it is having to wait. I found this totally essential, then I could rerun the models that had previously failed. Possibly this would be relevant to issue #7 and issue #19. After that, I could then call with silent=TRUE to read the output from the cluster log (The sys.sleep(0.1) flushes stdout I think) - and see where/why it had stalled.
Hope that's useful, see
#38
Thanks,
Wes
The text was updated successfully, but these errors were encountered: