You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So I have been playing around with options on how to deal with the large amount of data we need for the pypam and the env notebooks. I have several proposals, but I got a bit stuck on some of them. I list them here:
Use less data for pypam for a mybinder version - one month of each station? and just let people download it when using it. Pros: easy and already implemented. Cons: we don't get the full year analysis, which is a bit of a pity for the pypam plots but we'll survive -> to make it even faster better to make the environment notebook and pypam's use the same data
Using the postBuild function - means that mybinder downloads the data BEFORE the image is created, which means that a HUGE image is created, but once loaded it should work? Pros: it's an elegant solution Cons: I am not sure it will load for everyone a 20 GB image. See branch pypam/binder_docs https://github.com/ioos/soundcoop/tree/pypam/binder_docs -> to make it less heavy, would be better to make the environment notebook and pypam's use the same data
I tried downloading the files and removing them afterwards, both by using a modified version of open_mfdataset and by just removing all files after a certain number has been downloaded (and stored in memory). This should be a feasible solution but for some reason my kernel keeps dying when the files are removed. Help here?
Let me know the preferred solution, or if anyone has any possible improvements/suggestions for any of the options
The text was updated successfully, but these errors were encountered:
@cparcerisas, my vote would be 1) with the stated caveat that it is due to limitations in the binder environment.
Strategy 3) is ultimately the more robust one(no particular environment needed). The kernel may be dying because of memory limitations, but I need to understand the processing code in more detail to say for sure. Closing any open datasets will force the memory to be free.
@carriecwall @carueda @danellecline @KarinaKh
So I have been playing around with options on how to deal with the large amount of data we need for the pypam and the env notebooks. I have several proposals, but I got a bit stuck on some of them. I list them here:
Let me know the preferred solution, or if anyone has any possible improvements/suggestions for any of the options
The text was updated successfully, but these errors were encountered: