Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial draft of TEMPO virtual dataset tutorial #924

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

danielfromearth
Copy link
Collaborator

@danielfromearth danielfromearth commented Jan 8, 2025

Pull Request (PR) draft checklist - click to expand
  • Please review our
    contributing documentation
    before getting started.
  • Populate a descriptive title. For example, instead of "Updated README.md", use a
    title such as "Add testing details to the contributor section of the README".
    Example PRs: #763
  • Populate the body of the pull request with:
  • Update CHANGELOG.md with details about your change in a section titled
    ## Unreleased. If such a section does not exist, please create one. Follow
    Common Changelog for your additions.
    Example PRs: #763
  • Update the documentation and/or the README.md with details of changes to the
    earthaccess interface, if any. Consider new environment variables, function names,
    decorators, etc.

Click the "Ready for review" button at the bottom of the "Conversation" tab in GitHub
once these requirements are fulfilled. Don't worry if you see any test failures in
GitHub at this point!

Pull Request (PR) merge checklist - click to expand

Please do your best to complete these requirements! If you need help with any of these
requirements, you can ping the @nsidc/earthaccess-support team in a comment and we
will help you out!

  • Add unit tests for any new features.
  • Apply formatting and linting autofixes. You can add a GitHub comment in this Pull
    Request containing "pre-commit.ci autofix" to automate this.
  • Ensure all automated PR checks (seen at the bottom of the "conversation" tab) pass.
  • Get at least one approving review.

📚 Documentation preview 📚: https://earthaccess--924.org.readthedocs.build/en/924/

@danielfromearth danielfromearth linked an issue Jan 8, 2025 that may be closed by this pull request
Copy link

github-actions bot commented Jan 8, 2025

Binder 👈 Launch a binder notebook on this branch for commit 63bae71

I will automatically update this comment whenever this PR is modified

@betolink
Copy link
Member

@danielfromearth #923 was just merged just in case is a blocker here.

@danielfromearth
Copy link
Collaborator Author

@danielfromearth #923 was just merged just in case is a blocker here.

Hey @betolink, seems I'm still getting the same error, though I'm not sure if I'm getting the latest version set up correctly in Jupyterhub — since it looks like main has the #923 updates but it's not yet in the Jupyterhub environment (right?). If I do %pip install git+https://github.com/nsidc/earthaccess.git@main in the first line of a notebook, should I restart or no, and is there anything else I should do?

@danielfromearth danielfromearth added the impact: documentation Improvements or additions to documentation label Jan 24, 2025
@danielfromearth danielfromearth self-assigned this Jan 24, 2025
@betolink
Copy link
Member

@danielfromearth #923 was just merged just in case is a blocker here.

Hey @betolink, seems I'm still getting the same error, though I'm not sure if I'm getting the latest version set up correctly in Jupyterhub — since it looks like main has the #923 updates but it's not yet in the Jupyterhub environment (right?). If I do %pip install git+https://github.com/nsidc/earthaccess.git@main in the first line of a notebook, should I restart or no, and is there anything else I should do?

correct, we haven't released yet. I think we need to mamba uninstall -y earthaccess && pip install git+https://github.com/nsidc/earthaccess.git@main and then restart the kernel.

@danielfromearth
Copy link
Collaborator Author

Hmm. I just ran that. But then I still get this error:

File [~/earthaccess/earthaccess/api.py:419](https://openscapes.2i2c.cloud/user/danielfromearth/lab/tree/earthaccess/earthaccess/api.py#line=418), in get_s3_filesystem(daac, provider, results)
    417 provider = _normalize_location(provider)
    418 if results is not None:
--> 419     endpoint = results[0].get_s3_credentials_endpoint()
    420     if endpoint is not None:
    421         session = earthaccess.__store__.get_s3_filesystem(endpoint=endpoint)

KeyError: 0

Here's the relevant code:

results = earthaccess.search_data(
    short_name="TEMPO_NO2_L2",
    version="V03",
    temporal=("2024-01-11 12:00", "2024-01-18 12:00"),
)

open_options = {
    "access": "direct", 
    "load": True,
    "preprocess": preprocess_root_group,
    "concat_dim": "date_and_scan_num",  
    "coords": 'all', 
    "compat": 'override',
    "combine_attrs": 'drop_conflicts',
    "parallel": True
}

result_root = earthaccess.open_virtual_mfdataset(granules=results, **open_options)

@betolink, @ayushnag, other ideas to troubleshoot?

@ayushnag
Copy link
Collaborator

@danielfromearth I'm not quite sure why this error is occuring but I tried running the same code on my cloud instance and it was able load the data. Maybe check that you are installing the optional dep group for virtualizarr?: pip install 'git+https://github.com/nsidc/earthaccess.git@main#egg=earthaccess[virtualizarr]'

@danielfromearth
Copy link
Collaborator Author

The issue persists

Hmm, no difference with that. But it looks like it's still using the wrong version, because I see

File [~/earthaccess/earthaccess/dmrpp_zarr.py:95](https://openscapes.2i2c.cloud/user/danielfromearth/lab/tree/earthaccess/earthaccess/dmrpp_zarr.py#line=94), in open_virtual_mfdataset(granules, group, access, load, preprocess, parallel, **xr_combine_nested_kwargs)
     92 import xarray as xr
     94 if access == "direct":
---> 95     fs = earthaccess.get_s3_filesystem(results=granules[0])
     96     fs.storage_options["anon"] = False  # type: ignore
     97 else:

in the traceback, but that looks different than the same line 95 in PR 923.

What I'm using

mamba uninstall -y earthaccess
pip install 'earthaccess[virtualizarr] @ git+https://github.com/nsidc/earthaccess.git@main'

and then restarting the kernel. (I first tried the exact syntax you provided @ayushnag , but it gave a deprecation warning, so then I switched it to the earthaccess[virtualizarr] in front of the @ git ...)

Troubleshooting further...

@betolink, I don't understand the background warnings of the Jupyterhub environment, so I'm not sure what to try next to modify the environment.

Below is an image of what I am seeing when I click on the kernel selector ("Python 3 (ipykernel)") in the upper right corner. Should it show something different? Is there any other trick to restarting the kernel (I am hitting the "Restart Kernel..." button in the "Kernel" dropdown menu)?

temp

@betolink
Copy link
Member

Sorry was on a meeting and then lunch. I don't think you need the [virtualizarr] because it is already in the base environment.
I think it would be easier to do it from a terminal session in Jupyterlab:

mamba uninstall -y earthaccess

it should uninstall 3 packages that use earthaccess. Then if you do mamba list | grep "earthaccess" nothing should come back.
Now we can just install pip install git+https://github.com/nsidc/earthaccess.git@main
Once we have done that we just open the notebook (or restart the kernel from the kernel menu) and we should have the right version from git.

image

@danielfromearth
Copy link
Collaborator Author

version issue resolved

@betolink and I did some more troubleshooting and discovered that the issue was simpler than it seemed. I was running the notebook from within an earthaccess local directory, and so the import statement kept picking up the local package.

To resolve it, we moved the notebook out of the directory, ran it, and the error is no longer being raised, woohoo! (Thanks @betolink and @ayushnag for your assistance!)

new data-related issue

Beyond that error is a more data-related error, due to the structure of TEMPO data and concatenation of arrays with different shapes:

ValueError: cannot reindex or align along dimension 'mirror_step' because of conflicting dimension sizes: {131, 132, 116, 117, 125, 126}

This will require some reworking of the preprocessing logic, or concatenation along different dimensions.

Note: Since I won't be able to work on this again until next Friday, don't hold up any new earthaccess release on account of this Pull Request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact: documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add tutorial notebook for open_virtual_dataset
3 participants