Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

global attributes #23

Draft
wants to merge 43 commits into
base: main
Choose a base branch
from
Draft

global attributes #23

wants to merge 43 commits into from

Conversation

siligam
Copy link
Contributor

@siligam siligam commented Jul 25, 2024

addresses issue #16

@pgierz pgierz linked an issue Jul 31, 2024 that may be closed by this pull request
5 tasks
@siligam siligam marked this pull request as ready for review January 10, 2025 13:19
@pgierz pgierz self-requested a review January 10, 2025 13:23
Copy link
Member

@pgierz pgierz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start. There are a few things major changes and several minor ones I think would be good to address:

Major Changes

  1. At the moment, you are accessing a module-level dictionary, data. I would prefer that this comes from the user somehow, so you would need to integrate it into the CMORizer object and pass the that to the Rule. We already have assign_data_request_to_rule (or something like that) so you can copy that behaviour.
  2. I would, where possible, move the logic to extract global level attributes that you get from the DataRequestVariable directly to the DataRequestVariable object. In the end, something like this would be nice:
>>> ds = xr.Dataset(...)
>>> rule = Rule(...)
>>> rule.data_request_variable
<DataRequestVariable object at ...>
>>> drv = rule.data_request_variable
>>> ds.attrs.update(drv.get_global_attributes())

Minor Changes

  1. Docstrings for many of the functions are missing
  2. You committed some files with your working notes. We don't want those in the repository at the end, or, they should be included in the user handbook.
  3. Some changes in the YAML files where you changed a key (e.g. source_id), still have the old value as a comment. This makes the diffs unclean, so I would rather you just directly change things.
  4. The inclusion in the standard pipeline should happen before manual checkpointing. The manual checkpoint trigger should be directly before computation is called.

src/pymorize/global_attributes.py Outdated Show resolved Hide resolved
src/pymorize/global_attributes.py Outdated Show resolved Hide resolved
src/pymorize/global_attributes.py Outdated Show resolved Hide resolved
src/pymorize/global_attributes.py Outdated Show resolved Hide resolved
src/pymorize/global_attributes.py Outdated Show resolved Hide resolved
examples/sample.yaml Outdated Show resolved Hide resolved
src/pymorize/global_attributes.py Outdated Show resolved Hide resolved
src/pymorize/global_attributes.py Outdated Show resolved Hide resolved
src/pymorize/global_attributes.py Outdated Show resolved Hide resolved
src/pymorize/global_attributes.py Outdated Show resolved Hide resolved
@pgierz pgierz self-requested a review January 20, 2025 09:16
Copy link
Member

@pgierz pgierz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is on the right track, I have some improvement suggestions.

There were some comments from my last review that need to be addressed still, too.

Comment on lines +346 to +352
def get_global_attributes(self):
return self.ga.get_global_attributes(self.rule_attrs, self.table_header)

def set_global_attributes(self, cv, rule_attrs):
self.ga = GlobalAttributes(cv)
self.rule_attrs = rule_attrs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about for CMIP7DataRequestVariable and the generic DataRequestVariable stubs? At the moment, this is only implemented for CMIP6, so I would at least like to see some placeholders in CMIP7.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick look at CMIP7 CVs (https://github.com/WCRP-CMIP/CMIP7_CVs/), it is unclear at the moment how the CVs are exposed to the user. At this stage, the structure of json files and how the data is organised looks different from CMIP6 CVs and so the code that reads and loads CIMP6 CVs does not work CMIP7.

Coming back to the your question, for CMIP7 I could add get_global_attributes and set_global_attributes methods in CMIP7DataRequestVariable which invoked will raise NotImplemented error.

d = {name: int(val) for name, val in d.groupdict().items()}
return d

def _source_id_related(self, rule: dict) -> dict:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename the argument rule to be something more specific here to avoid confusion. You're passing in the dictionary of extracted attributes from the rule, not the rule itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this attributed is renamed to attrs_map_on_rule to avoid confusion.

src/pymorize/global_attributes.py Outdated Show resolved Hide resolved
Comment on lines +234 to +252
def set_global_attributes(self, ds, rule):
"""
Set global attributes on a dataset based on the given rule.

Parameters
----------
ds : xr.Dataset
The dataset to set the global attributes on.
rule : DataRequestRule
The data request rule to use to set the attributes.

Returns
-------
ds : xr.Dataset
The dataset with the global attributes set.
"""
d = self.get_global_attributes(rule)
ds.attrs.update(d)
return ds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right place for this. After all, steps are just regular functions...here you have a method of the GlobalAttributes class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this function needs to go away especially after the integration of global attributes into the framework. I will remove it. To replace this functionality, I guess there must be another piece of code as a part of this module or another module that should be used in the pipeline to set the global attributes

d = {name: int(val) for name, val in d.groupdict().items()}
return d

def _source_id_related(self, rule: dict) -> dict:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a doc-string and some examples here, this is a big enough function to deserve some testing.

assert model_component in model_components
else:
raise ValueError("Missing required attribute 'model_component'")
grid = model_components[model_component]["description"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about this one? grid is the model component's description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit unclear what should be the value for this from the documentation. This needs to be cross-checked with previously generated output from (maybe) seamore tool.

The `next(iter(thing))` is to get the first element from the thing. The simpler way to get the first element is simply index accessing `thing[0]` but over time my style of programming has moved from away from index access to iter based thing basically to avoid hardcoding 0 or 1 or 2. May be it is matter of taste but more importantly I like the way it is re-written sounds more meaningful to follow the code and I like it.

Co-authored-by: Paul Gierz <[email protected]>
else:
if len(inst_id) > 1:
# No user-provided institution_id; infer it
if len(cv_institution_ids) > 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can make it abundantly clear here and do something like:

Suggested change
if len(cv_institution_ids) > 1:
if isinstance(inst_id, list) and len(cv_institution_ids) > 1:

Lower down, you could also do an instance check for list. That would avoid problems with you accidentally getting a string, or something like that.

...just an idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cv_institution_ids is the institution_ids from CMIP6 CVs. To do a isinstance check for list in other terms mean checking the schema defined in json for this field. This can be done for all the fields but it steers the focus on a different direction. What make sense is to have module that does schema checking but should that be a part of pymorize. Data source schema validation. plz comment

@pgierz pgierz marked this pull request as draft January 22, 2025 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Global attributes definition
2 participants