global attributes #23

siligam · 2024-07-25T08:49:31Z

addresses issue #16

…the name in CV

…e.py)

…tcdf.

…et attributes

pgierz

This is a good start. There are a few things major changes and several minor ones I think would be good to address:

Major Changes

At the moment, you are accessing a module-level dictionary, data. I would prefer that this comes from the user somehow, so you would need to integrate it into the CMORizer object and pass the that to the Rule. We already have assign_data_request_to_rule (or something like that) so you can copy that behaviour.
I would, where possible, move the logic to extract global level attributes that you get from the DataRequestVariable directly to the DataRequestVariable object. In the end, something like this would be nice:

>>> ds = xr.Dataset(...)
>>> rule = Rule(...)
>>> rule.data_request_variable
<DataRequestVariable object at ...>
>>> drv = rule.data_request_variable
>>> ds.attrs.update(drv.get_global_attributes())

Minor Changes

Docstrings for many of the functions are missing
You committed some files with your working notes. We don't want those in the repository at the end, or, they should be included in the user handbook.
Some changes in the YAML files where you changed a key (e.g. source_id), still have the old value as a comment. This makes the diffs unclean, so I would rather you just directly change things.
The inclusion in the standard pipeline should happen before manual checkpointing. The manual checkpoint trigger should be directly before computation is called.

src/pymorize/global_attributes.py

examples/sample.yaml

src/pymorize/global_attributes.py

Co-authored-by: Paul Gierz <[email protected]>

pgierz

This is on the right track, I have some improvement suggestions.

There were some comments from my last review that need to be addressed still, too.

pgierz · 2025-01-20T09:20:58Z

src/pymorize/data_request/variable.py

+    def get_global_attributes(self):
+        return self.ga.get_global_attributes(self.rule_attrs, self.table_header)
+
+    def set_global_attributes(self, cv, rule_attrs):
+        self.ga = GlobalAttributes(cv)
+        self.rule_attrs = rule_attrs
+


What about for CMIP7DataRequestVariable and the generic DataRequestVariable stubs? At the moment, this is only implemented for CMIP6, so I would at least like to see some placeholders in CMIP7.

A quick look at CMIP7 CVs (https://github.com/WCRP-CMIP/CMIP7_CVs/), it is unclear at the moment how the CVs are exposed to the user. At this stage, the structure of json files and how the data is organised looks different from CMIP6 CVs and so the code that reads and loads CIMP6 CVs does not work CMIP7.

Coming back to the your question, for CMIP7 I could add get_global_attributes and set_global_attributes methods in CMIP7DataRequestVariable which invoked will raise NotImplemented error.

pgierz · 2025-01-20T09:23:22Z

src/pymorize/global_attributes.py

+        d = {name: int(val) for name, val in d.groupdict().items()}
+        return d
+
+    def _source_id_related(self, rule: dict) -> dict:


I would rename the argument rule to be something more specific here to avoid confusion. You're passing in the dictionary of extracted attributes from the rule, not the rule itself.

this attributed is renamed to attrs_map_on_rule to avoid confusion.

src/pymorize/global_attributes.py

pgierz · 2025-01-20T09:59:37Z

src/pymorize/global_attributes.py

+    def set_global_attributes(self, ds, rule):
+        """
+        Set global attributes on a dataset based on the given rule.
+
+        Parameters
+        ----------
+        ds : xr.Dataset
+            The dataset to set the global attributes on.
+        rule : DataRequestRule
+            The data request rule to use to set the attributes.
+
+        Returns
+        -------
+        ds : xr.Dataset
+            The dataset with the global attributes set.
+        """
+        d = self.get_global_attributes(rule)
+        ds.attrs.update(d)
+        return ds


I don't think this is the right place for this. After all, steps are just regular functions...here you have a method of the GlobalAttributes class?

I agree, this function needs to go away especially after the integration of global attributes into the framework. I will remove it. To replace this functionality, I guess there must be another piece of code as a part of this module or another module that should be used in the pipeline to set the global attributes

pgierz · 2025-01-20T10:00:22Z

src/pymorize/global_attributes.py

+        d = {name: int(val) for name, val in d.groupdict().items()}
+        return d
+
+    def _source_id_related(self, rule: dict) -> dict:


Please add a doc-string and some examples here, this is a big enough function to deserve some testing.

pgierz · 2025-01-20T10:02:29Z

src/pymorize/global_attributes.py

+            assert model_component in model_components
+        else:
+            raise ValueError("Missing required attribute 'model_component'")
+        grid = model_components[model_component]["description"]


Are you sure about this one? grid is the model component's description?

It is a bit unclear what should be the value for this from the documentation. This needs to be cross-checked with previously generated output from (maybe) seamore tool.

The `next(iter(thing))` is to get the first element from the thing. The simpler way to get the first element is simply index accessing `thing[0]` but over time my style of programming has moved from away from index access to iter based thing basically to avoid hardcoding 0 or 1 or 2. May be it is matter of taste but more importantly I like the way it is re-written sounds more meaningful to follow the code and I like it. Co-authored-by: Paul Gierz <[email protected]>

pgierz · 2025-01-21T11:23:27Z

src/pymorize/global_attributes.py

        else:
-            if len(inst_id) > 1:
+            # No user-provided institution_id; infer it
+            if len(cv_institution_ids) > 1:


Maybe we can make it abundantly clear here and do something like:

Suggested change

if len(cv_institution_ids) > 1:

if isinstance(inst_id, list) and len(cv_institution_ids) > 1:

Lower down, you could also do an instance check for list. That would avoid problems with you accidentally getting a string, or something like that.

...just an idea.

cv_institution_ids is the institution_ids from CMIP6 CVs. To do a isinstance check for list in other terms mean checking the schema defined in json for this field. This can be done for all the fields but it steers the focus on a different direction. What make sense is to have module that does schema checking but should that be a part of pymorize. Data source schema validation. plz comment

siligam and others added 2 commits July 25, 2024 03:27

global attributes

46556d9

example input yaml (not finialized design yet)

9f97adb

pgierz linked an issue Jul 31, 2024 that may be closed by this pull request

Global attributes definition #16

Open

5 tasks

siligam added 5 commits August 1, 2024 13:36

handle variant_label

873235b

added license to global attributes

5084e20

Merge branch 'gattrs_new' into gattrs

0de78fe

merged main branch

040847e

added set_global_attributes funtion

6f37879

pgierz assigned siligam Oct 9, 2024

siligam and others added 21 commits November 4, 2024 09:36

changed attribute name (maintainer_url to further_info_url) to match …

410b3ea

…the name in CV

gattrs checklist

49e2721

Merge branch 'main' into gattrs

636e89c

wip

20b45fe

wip 123

554fea9

feat: global attributes step

c811937

wip refatored global attrs

bc7fd97

(wip) global attrs integration

cd17097

wip: try to fix bad test path

0499dd4

wip

0cd482d

Merge branch 'main' into gattrs

f8940a0

Merge branch 'main' into gattrs

1cf1dc3

wip..

c4eabc8

wip... added model_component

f1f36c5

wip.. corrected typo

4c7c369

FESOM is not valid source_id

3524152

wip.. mark grid_label as required attribute in rule (i.e., in validat…

f3bd8e2

…e.py)

converting "data_spec_version" back to string before writing it to ne…

2694c43

…tcdf.

version_attribute

18def11

fix: ensure data_specs_version is a string when attached to the datas…

cfcc3df

…et attributes

fix isort

112a177

siligam added 4 commits January 9, 2025 14:31

unused imports clean up

d2362e3

yamllint removed blank space

6976f3f

cleanup.. removed version_attribute

536dd96

Merge branch 'main' into gattrs

cab9655

siligam marked this pull request as ready for review January 10, 2025 13:19

pgierz self-requested a review January 10, 2025 13:23

pgierz requested changes Jan 10, 2025

View reviewed changes

siligam and others added 6 commits January 10, 2025 18:12

Update src/pymorize/global_attributes.py

84bc4b8

Co-authored-by: Paul Gierz <[email protected]>

wip integrate global.attrs into framework

2669504

wip cleanup unused imports

141ed47

wip cleanup

90494ac

wip set subset of rule attributes on drv object

3165602

global attributes collected from rule object as dict object

56e8be3

pgierz self-requested a review January 20, 2025 09:16

fix doc test: it prefers single quote

0eb6eaa

pgierz requested changes Jan 20, 2025

View reviewed changes

pgierz reviewed Jan 21, 2025

View reviewed changes

refactored source_id_related function

a5a0590

pgierz marked this pull request as draft January 22, 2025 11:57

siligam added 2 commits January 24, 2025 09:23

added uuid for tracking_id and creation_date of inputs

c12c25a

added documentation

a279cbc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

global attributes #23

global attributes #23

siligam commented Jul 25, 2024

pgierz left a comment

pgierz left a comment

pgierz Jan 20, 2025

siligam Jan 24, 2025

pgierz Jan 20, 2025

siligam Jan 24, 2025

pgierz Jan 20, 2025

siligam Jan 20, 2025

pgierz Jan 20, 2025

pgierz Jan 20, 2025

siligam Jan 20, 2025

pgierz Jan 21, 2025

siligam Jan 24, 2025

	if len(cv_institution_ids) > 1:
	if isinstance(inst_id, list) and len(cv_institution_ids) > 1:

global attributes #23

Are you sure you want to change the base?

global attributes #23

Conversation

siligam commented Jul 25, 2024

pgierz left a comment

Choose a reason for hiding this comment

Major Changes

Minor Changes

pgierz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment