Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap bathymetry download in a try/catch #219

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

glwagner
Copy link
Member

@glwagner glwagner commented Nov 5, 2024

@navidcy navidcy added the build docs Add this label to built the docs in a PR label Nov 5, 2024
@navidcy navidcy self-requested a review November 5, 2024 05:33
@navidcy
Copy link
Collaborator

navidcy commented Nov 5, 2024

when you suggested a try-catch I thought you meant in the main module... then I notice there is already something there...

try
Downloads.download(fileurl, filepath; progress=download_progress, verbose=true)
catch
cmd = `wget --no-check-certificate -O $filepath $fileurl`
run(cmd)
end

wouldn't it be more robust if we had the _try iterator there?

@simone-silvestri
Copy link
Collaborator

simone-silvestri commented Nov 5, 2024

I am not sure this is a solution. I don't think retrying a download quickly after it failed might make it work. We could try to catch the error and figure out how to work around it.

@glwagner
Copy link
Member Author

glwagner commented Nov 5, 2024

I am not sure this is a solution. I don't think retrying a download quickly after it failed might make it work. We could try to catch the error and figure out how to work around it.

Umm no the issue is not the download itself, but rather when we try to load the file after a partial download has occurred. If the download fails, a corrupted file will be present. We need to redownload in that case.

This is what caused the docs load to fail, for some reason the download failed on the first time. Then we tried to re-run in the same buildkite build and there was a corrupted file in the scratch depot

@simone-silvestri
Copy link
Collaborator

The error that Navid was referring to is really because of the server being down.
I tried the download link manually on a browser and it was not working.
https://buildkite.com/clima/climaocean-ci/builds/1606#0192f697-062c-4cd3-a49e-6a98149a9af7
This error happens at download, not at file loading.

@glwagner
Copy link
Member Author

glwagner commented Nov 5, 2024

The error that Navid was referring to is really because of the server being down. I tried the download link manually on a browser and it was not working. https://buildkite.com/clima/climaocean-ci/builds/1606#0192f697-062c-4cd3-a49e-6a98149a9af7 This error happens at download, not at file loading.

There were two errors.

The first was because the server was down.

The second error came from NCDatasets, stating that the file was invalid.

I have also observed locally the same issue.

Are you claiming that data corruption cannot occur?

@simone-silvestri
Copy link
Collaborator

Nono, data corruption can definitely occur. I was pointing out that the error highlighted by Navid was a downloading issue.

@simone-silvestri
Copy link
Collaborator

Whoops, maybe I am wrong, the error is because of data loading

@glwagner
Copy link
Member Author

glwagner commented Nov 5, 2024

Here's the second error:

https://buildkite.com/clima/climaocean-examples/builds/766#0192f3b5-7707-4829-b955-922fa0f1539d/188-313

it says

ERROR: LoadError: LoadError: NetCDF error: Opening path /central/scratch/esm/slurm-buildkite/climaocean-examples/766/depot/default/scratchspaces/0376089a-ecfe-4b0e-a64f-9c555d74d754/Bathymetry/ETOPO_2022_v1_60s_N90W180_surface.nc: NetCDF: Unknown file format (NetCDF error code: -51)

that occurs on the second try. On the first try, we got a server / download error:

https://buildkite.com/clima/climaocean-examples/builds/766#0192ef56-76dc-4e0a-9294-ba3598531c61/188-1072

ERROR: LoadError: LoadError: failed process: Process(`wget --no-check-certificate -O /central/scratch/esm/slurm-buildkite/climaocean-examples/766/depot/default/scratchspaces/0376089a-ecfe-4b0e-a64f-9c555d74d754/Bathymetry/ETOPO_2022_v1_60s_N90W180_surface.nc https://www.ngdc.noaa.gov/thredds/fileServer/global/ETOPO2022/60s/60s_surface_elev_netcdf/ETOPO_2022_v1_60s_N90W180_surface.nc`, ProcessExited(8)) [8]

Note that regrid_bathymetry does check if there is a file:

if isfile(filepath)
@info "Regridding bathymetry from existing file $filepath."

however as far as I can tell it does not check if the file is valid / uncorrupted. In other words, isfile(filepath) may return true, but we still should have redownloaded the data.

This is the problem I am trying to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build docs Add this label to built the docs in a PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants