mybinder data storage limitation #9

cparcerisas · 2024-09-19T15:08:55Z

@carriecwall @carueda @danellecline @KarinaKh

So I have been playing around with options on how to deal with the large amount of data we need for the pypam and the env notebooks. I have several proposals, but I got a bit stuck on some of them. I list them here:

Use less data for pypam for a mybinder version - one month of each station? and just let people download it when using it. Pros: easy and already implemented. Cons: we don't get the full year analysis, which is a bit of a pity for the pypam plots but we'll survive -> to make it even faster better to make the environment notebook and pypam's use the same data
Using the postBuild function - means that mybinder downloads the data BEFORE the image is created, which means that a HUGE image is created, but once loaded it should work? Pros: it's an elegant solution Cons: I am not sure it will load for everyone a 20 GB image. See branch pypam/binder_docs https://github.com/ioos/soundcoop/tree/pypam/binder_docs -> to make it less heavy, would be better to make the environment notebook and pypam's use the same data
I tried downloading the files and removing them afterwards, both by using a modified version of open_mfdataset and by just removing all files after a certain number has been downloaded (and stored in memory). This should be a feasible solution but for some reason my kernel keeps dying when the files are removed. Help here?

Let me know the preferred solution, or if anyone has any possible improvements/suggestions for any of the options

danellecline · 2024-09-20T00:57:00Z

@cparcerisas, my vote would be 1) with the stated caveat that it is due to limitations in the binder environment.

Strategy 3) is ultimately the more robust one(no particular environment needed). The kernel may be dying because of memory limitations, but I need to understand the processing code in more detail to say for sure. Closing any open datasets will force the memory to be free.

cparcerisas · 2024-09-20T08:48:59Z

@danellecline yes I played around with it, you can see it here:
https://github.com/ioos/soundcoop/blob/pypam/binder_docs/2_analysis_of_HMD_pypam/data_analysis_with_pypam.ipynb
Functions load_data_station_slow and load_data_station

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mybinder data storage limitation #9

mybinder data storage limitation #9

cparcerisas commented Sep 19, 2024 •

edited

Loading

danellecline commented Sep 20, 2024

cparcerisas commented Sep 20, 2024

mybinder data storage limitation #9

mybinder data storage limitation #9

Comments

cparcerisas commented Sep 19, 2024 • edited Loading

danellecline commented Sep 20, 2024

cparcerisas commented Sep 20, 2024

cparcerisas commented Sep 19, 2024 •

edited

Loading