Add a decoding mask option to only include subset of grid nodes in m2g #34

joeloskarsson · 2024-10-21T15:44:41Z

Describe your changes

For LAM forecasting it is reasonable to not always predict values in nodes that are considered part of the boundary forcing. This is in particular a change planned for neural-lam. When we consider the graphs involved, this means that the g2m edges should only connect to a subset of the grid nodes.

This PR introduces an option decode_mask that allows for specifying an Iterable of booleans (e.g. a numpy-array) specifying which of the grid nodes should be included in the decoding-part of the graph (m2g). This allows in the LAM case to specify such a mask with True for the inner region nodes.

This builds on #32, which should be merged first. Here is a diff for only this PR in the meantime: joeloskarsson/weather-model-graphs@general_coordinates...decoding_mask

Type of change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
I have performed a self-review of my code
For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
I have updated the documentation to cover introduced code changes
I have added tests that prove my fix is effective or that my feature works
I have given the PR a name that clearly describes the change, written in imperative form (context).
I have requested a reviewer and an assignee (assignee is responsible for merging)

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

the code is readable
the code is well tested
the code is documented (including return types and parameters)
the code is easy to maintain

Author checklist after completed review

I have added a line to the CHANGELOG describing this change, in a section
reflecting type of change (add section where missing):
- added: when you have added new functionality
- changed: when default behaviour of the code has been changed
- fixes: when your contribution fixes a bug

Checklist for assignee

PR is up to date with the base branch
the tests pass
author has added an entry to the changelog (and designated the change as added, changed or fixed)
Once the PR is ready to be merged, squash commits and merge the PR.

sadamov

Looks good! Agree that this is required. Testing for "less than" seems sufficient 👍

More general comment: I am slightly worried about the mask being generated in mllam-data-prep and then being used here and in neural-lam. We probably need some solid checks in neural-lam about the dimensions of the graph, the data-tensors and the mask.

…args in archetypes

leifdenby · 2024-11-28T15:52:41Z

I just had a quick skim using the diff to #32 link you provided (thanks!). In general this looks great. I will make a proper review once #32 is merged in. I hope that's ok!

joeloskarsson · 2024-11-29T14:58:45Z

There is one small problem currently, which is one reason why a6b6137 was introduced.

When using the decoding mask (maybe this could happen also without it, not sure?), not all mesh nodes might be connected to the grid in m2g, i.e. there are mesh nodes that are not endpoints of any edges in m2g. This is desirable. However, when you split up a graph using

weather-model-graphs/src/weather_model_graphs/networkx_utils.py

Line 50 in a6d43e3

def split_graph_by_edge_attribute(graph, attr):

only nodes that are endpoint of any edge will be included. Node subsetting is only implicit through the edge sub-graphing:

weather-model-graphs/src/weather_model_graphs/networkx_utils.py

Lines 80 to 88 in a6d43e3

    
           subgraphs[edge_value] = graph.copy().edge_subgraph( 
        
               # Subgraphs only contain edges that actually has attribute, 
        
               # and where it is correct value 
        
               [ 
        
                   edge 
        
                   for edge in graph.edges 
        
                   if attr in graph.edges[edge] and graph.edges[edge][attr] == edge_value 
        
               ] 
        
           )

This means that something like

weather-model-graphs/tests/test_save.py

Line 30 in a6d43e3

    
           graph_components = wmg.split_graph_by_edge_attribute(graph=graph, attr="component")

will implicitly filter the mesh nodes in m2g to just the ones that are connected to the grid. This messes up the indexing of mesh nodes, and makes graphs saved this way incompatible with neural-lam.

This is why the option to return separate components was introduced, since it is unnecessary to merge everything and then split it up again (especially as splitting now breaks the node indices). Do you have any thoughts about good way to fix this @leifdenby ? Or do we want to fix it? Not sure if if has to be fixed in this PR, since in a sense it's a more general issue with the edge-attribute-based sub-graphing.

joeloskarsson · 2024-11-29T15:49:42Z

Now all changes from #32 is merged into main and here so this diff is readable.

leifdenby · 2024-12-05T12:33:15Z

This messes up the indexing of mesh nodes, and makes graphs saved this way incompatible with neural-lam.

Ok, that sounds like an issue. I don't fully understand why yet, but I believe you. So rather than merging to create a graph that represents the whole encode-process-decode graph of nodes and edges, you introduced return_components in a6b6137#diff-72daea2494478edf48e319a44c42b839b9a6175a03dd1b2fa97ca8db212a79caR184 to instead return g2m, m2m and m2g separately. Is that right?

This is why the option to return separate components was introduced, since it is unnecessary to merge everything and then split it up again (especially as splitting now breaks the node indices). Do you have any thoughts about good way to fix this @leifdenby ? Or do we want to fix it? Not sure if if has to be fixed in this PR, since in a sense it's a more general issue with the edge-attribute-based sub-graphing.

The reason why I implemented the graph creation by including a merging to one big graph is so that each node to could be given a globally (as in across the whole graph) unique ID. I then assumed that this id could then be used for constructing the adjacency matrices for different components (g2m, m2m, m2g say) which are then saved to individual files. Are you saying that this of having a global index doesn't work when masking out which grid-nodes are used in the m2g component? Or is it because the index in the grid is actually not a global index but instead an index that only runs over grid points maybe?

I might have to create a little toy example to understand the issue properly

leifdenby · 2024-12-05T12:47:44Z

I might have to create a little toy example to understand the issue properly

Maybe if you have time you could add a test/notebook @joeloskarsson with an assert that checks for the indexing that you expect? I assume it is the .pt-file that represents the m2g adjacency matrix that is incorrect.

I tried creating a notebook just now, but I am not quite sure about what the correct answer would be...

joeloskarsson · 2024-12-05T12:52:05Z

Yes, the issue does appear when you split the combined graph. But the problem is not that the indices are wrong, it is that you lose some nodes (which make the indices wrong on those remaining nodes). So your toy example could be:

Create a graph where not all mesh nodes are connected to the grid in m2g, i.e. there are no edges in m2g that start in these mesh nodes.
Split this graph as in

weather-model-graphs/tests/test_save.py

Line 30 in a6d43e3

graph_components = wmg.split_graph_by_edge_attribute(graph=graph, attr="component")
Now the m2g component will no longer contain these mesh nodes that are not connected to the grid. So e.g. g2m and m2g will have a different amount of mesh nodes.

leifdenby · 2024-12-09T10:46:47Z

I have dug a bit deeper @joeloskarsson and I have realised where the issue is. I was under the impression that torch_geometric.utils.convert.from_networkx() will use the node labels as the indicies in the adjacency matrix (and that is why I set the node labels as a globally unique index in https://github.com/mllam/weather-model-graphs/blob/main/src/weather_model_graphs/create/base.py#L176). But that I realise now that is not the case, instead the node indicies are always in the range [0,n_grid_points-1] as defined in the following line: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/utils/convert.py#L231). I also release that was maybe a bad assumption given that node labels can be strings. This means, as you say, that splitting the graphs before saving will mean that indicies in the adjacency matrices will be inconsistent between g2m and m2g.

What I still don't quite understand is what neural-lam assumes about the grid-index values for different parts of the whole encode-process-decode graph. I think (based on the fact that we're having this discussion) that g2m and m2g should share indices so that the same grid-point in either g2m or m2g has the same index in both. But is that also the case for the m2m graph (or subgraphs when using the hierarchical graph)? Or does neural-lam expect that the mesh-nodes start with index value 0?

I have added a notebook with my experimentation here: https://github.com/leifdenby/weather-model-graphs/blob/decoding_mask/docs/decoding_mask.ipynb, and a started on a test for the decoding mask too: https://github.com/leifdenby/weather-model-graphs/blob/decoding_mask/tests/test_graph_decode_gridpoints_mask.py#L17

joeloskarsson · 2024-12-10T02:22:35Z

I was under the impression that torch_geometric.utils.convert.from_networkx() will use the node labels as the indicies in the adjacency matrix (and that is why I set the node labels as a globally unique index in https://github.com/mllam/weather-model-graphs/blob/main/src/weather_model_graphs/create/base.py#L176). But that I realise now that is not the case, instead the node indicies are always in the range [0,n_grid_points-1] as defined in the following line:

Yes, this is exactly what I also realized. I wish I would have had a bit more time to write out an explanation for this and I could have saved you the work 😓 Well well, I hope you feel like you gained some insights on the way. In my view the arbitraryness of indexing when converting networkx tto pyg feels really bad and not very thought through. But I guess this is all just because networkx doesn't really have a notion of an integer node index (closer to how we think of a graph in theory, with sets, bad for practical work imo 😛).

I think (based on the fact that we're having this discussion) that g2m and m2g should share indices so that the same grid-point in either g2m or m2g has the same index in both.

I am not sure if that is actually necessary, but we could do that. My implementation using this in neural-lam makes the hard decision that the nodes that you want to decode to have the first indices, follows by the nodes only used in g2m. This does mean that it doesn't matter if you include the masked out node indices in your m2g graph (you'll get the same edge_index tensor in the end anyway). On a more conceptual level, one could either argue that m2g should only index into the set of nodes we decode to, or argue that m2g should index into the set of all grid nodes, but only contain edges to the unmasked ones. Not obvious to me what makes more sense.

What I still don't quite understand is what neural-lam assumes about the grid-index values for different parts of the whole encode-process-decode graph.

Yea, there is quite a lot to this. I have made very minimal assumptions wrt node indexing in neural-lam. Given that I know that we have some different perspectives on the graph (I have this perspective of separate graphs, rather than one joined), I feel like it is something that should be explained better. Maybe I could try to write up some explanation for what neural-lam expects (in that is mllam/neural-lam#93) when I find some time, or we could sit down and talk it through.

leifdenby · 2025-01-08T16:18:45Z

Thanks @joeloskarsson ! I am glad we finally found out where the indexing issue is arising from 😅

@SimonKamuk had the idea that if we want to retain the idea of building a single networkx.DiGraph object and then splitting it, what we could do is to add an option when splitting this "global" graph that indicates that the user wishes to retain in every subgraph all the nodes, but only the edges that have a specific value for an attribute. This would mean that all subgraphs that are created would share the same indexing when converting using pyg, but of course each subgraph will also contain a lot of nodes that are not connected by any edges. Would this be ok? I made the plot I've included here when I was trying to work out what you store in the pickled tensor files. And at least wrt the edge index that appears to be a globally unique integer for each node. So maybe this would be an ok approach?

One thing I'm worried about is that this would cause the _node_features.pt file to have the same content for every subgraph I think.

I started work on the node retention idea in https://github.com/leifdenby/weather-model-graphs/blob/feat/node-retention-with-graph-splitting/src/weather_model_graphs/networkx_utils.py#L51, but then I realised that it would be better to check with you.

joeloskarsson · 2025-01-08T16:31:47Z

I think what you describe would indeed work, and solve the problem. But would have to test with neural-lam to be sure. And yes, we would have to then rework what is saved in each _node_features.pt so that only features for the correct subset of nodes are in there (and not all nodes).

I think however that the easier fix is to rework

weather-model-graphs/src/weather_model_graphs/networkx_utils.py

Line 50 in a6d43e3

def split_graph_by_edge_attribute(graph, attr):

to somehow not drop the mesh nodes that are not connected to the grid. Or rather, to split on something else than edge attributes then. One option could be to introduce some sort of node attribute that one could split based on, telling which sub-graphs each node is part of. That gets slightly complicated though, as nodes can belong to multiple subgraphs (e.g. g2m and m2g).
Then there is of course always the options to not merge all graphs just to then split them again ;) (as I implemented here)

Regarding this PR though: Are you ok with merging this with the return_components option, and trying to fix the splitting issue separately? Or do you see that as necessary to fix as part of this PR?

leifdenby · 2025-01-09T10:31:08Z

I think what you describe would indeed work, and solve the problem. But would have to test with neural-lam to be sure

Ok, great!

Regarding this PR though: Are you ok with merging this with the return_components option, and trying to fix the splitting issue separately? Or do you see that as necessary to fix as part of this PR?

Good point. I think we should merge this PR with the functionality you have proposed. The functionality added here makes sense and we should keep moving. I will give a final review of the code as-is so we can get this in.

(In general wrt how the graph is stored to disk from weather-model-graph and loaded in neural-lam I am not fully sure how to split the functionality. It seems clear to me that we don't want to simply serialise networkx.DiGraph object to disk directly, since that would require networkx in neural-lam, on the other hand the format may evolve in neural-lam. In a way it will be nice to formalise this "graph storage format" at some point, but I will leave that pending)

joeloskarsson · 2025-01-09T11:00:39Z

The only thing that I think is missing here is a documentation notebook showing how to use this. I have not got around to making that yet. But maybe you could just turn the notebook you added into that?

leifdenby · 2025-01-09T12:05:58Z

The only thing that I think is missing here is a documentation notebook showing how to use this. I have not got around to making that yet. But maybe you could just turn the notebook you added into that?

I think the notebook you have added is already fine. Or by "this" do you not mean the decoding mask functionality or is it something else?

leifdenby

Looks great! I've just added a few suggestions

leifdenby · 2024-11-28T15:50:15Z

src/weather_model_graphs/create/archetype.py

+    projection: cartopy.crs.CRS or None
+        Projection instance used to transform given lat-lon coords to in-projection
+        Cartesian coordinates. If None the coords are assumed to already be Cartesian.
+    decode_mask: Iterable or None


I think it would be good to make decode_mask an explicit argument rather than in **kwargs (I think that's where it currently supposed to be handled, right?)

Also in general, maybe we could use Iterable[int] in type hints to make it clear that the mask is represented by a set of ints

leifdenby · 2025-01-09T11:48:59Z

src/weather_model_graphs/create/archetype.py

I think it would be good to keep the explicit arguments here. This would mean that this file is mostly unchanged, although there was a missing docstring for max_num_levels, in create_oskarsson_hierarchical_graph() that you have added in this PR

leifdenby · 2025-01-09T11:51:42Z

src/weather_model_graphs/create/base.py

@@ -39,6 +41,8 @@ def create_all_graph_components(
    g2m_connectivity_kwargs={},
    coords_crs: pyproj.crs.CRS | None = None,
    graph_crs: pyproj.crs.CRS | None = None,
+    decode_mask: Iterable | None = None,


I would move this decode_mask argument into the m2g_connectivity arguments I think and then masking could in fact be done within connect_nodes_across_graphs() since the masking effects the way nodes are connected between graphs. What do you think?

You could for example introduce arguments called source_nodes_mask and target_nodes_mask to connect_nodes_across_graphs (https://github.com/mllam/weather-model-graphs/blob/main/src/weather_model_graphs/create/base.py#L181)

leifdenby · 2025-01-09T11:57:39Z

src/weather_model_graphs/create/base.py

@@ -162,6 +181,14 @@ def create_all_graph_components(
        for edge in graph.edges:
            graph.edges[edge]["component"] = name

+    if return_components:


Could we add a comment here, something like:

Suggested change

if return_components:

if return_components:

# Because merging to a single graph and then splitting again leads to changes in node indexing when converting to `pyg.Data` objects (this in part is due to the to `m2g` and `g2m` having a different set of grid nodes) the ability to return the graph components (`g2m`, `m2m` and `m2g`) has been added here. See https://github.com/mllam/weather-model-graphs/pull/34#issuecomment-2507980752 for details

leifdenby · 2025-01-09T11:59:50Z

src/weather_model_graphs/networkx_utils.py

@@ -98,40 +98,29 @@ def split_graph_by_edge_attribute(graph, attr):
            f"No subgraphs were created. Check the edge attribute '{attr}'."
        )

-    # copy node attributes


wow, what a mess 😮 I don't know what I did here... Good you are removing these extra lines

leifdenby · 2025-01-09T12:03:38Z

src/weather_model_graphs/networkx_utils.py

+def sort_nodes_in_graph(graph):
+    """
+    Creates a new networkx.DiGraph that is a copy of input, but with nodes
+    sorted according to their id


Suggested change

sorted according to their id

sorted according to their label value

I think strictly speaking nodes are identified by something called their "label" rather than "id" (e.g. https://networkx.org/documentation/stable/reference/generated/networkx.relabel.convert_node_labels_to_integers.html#networkx.relabel.convert_node_labels_to_integers). And in our case we use integers as labels

leifdenby · 2025-01-09T12:04:41Z

src/weather_model_graphs/networkx_utils.py


-    return subgraphs
+    return sorted_graph


 def replace_node_labels_with_unique_ids(graph):


I am realising now that this is a poor name, maybe I should have called it replace_node_labels_with_unique_integer_values

joeloskarsson added 24 commits October 2, 2024 17:58

Start looking into where xy has to be changed

8a11527

Change shapes in docstrings

2c7e4eb

Make flat mesh graphs work with new coordinate layout

6f03908

Merge PR mllam#26 into branch

d2a081a

Fix coordinate handling in multirange graph creation

9dbd2a5

Rename grid_refinement_factor to mesh_node_distance

eada6ea

Fix existing tests to work with new coordinate format

01180a1

Add test for irregularlygridded coordinates

d103ad6

Remove unneeded eps in mesh level calculation

7ba387e

Change documentation to use new format and arguments for coordinates

9069c84

Fix bug in coordinate order for flat graphs

d592c2b

Start working on allowing latlon coordinates

fd695b1

Introduce coords and projection

2c781ee

Merge branch 'main' into general_coordinates

5cf3bbf

Fix linting

0f17d2c

Fix tests with coords keyword argument

f145025

Implement lat-lon transformation through projection

cc4cc5e

Add documentation page about graphs constructed using lat-lons

2627e37

Adjust coords keyword arg in docs

c764fd7

Add test for lat-lon coordinates

f6ae35b

Fix linting of docs

22caf65

Merge main into branch

7ec34ff

Add decode_mask for only including subset of grid nodes in m2g

f564c70

Add test for decode filtering

2333d53

joeloskarsson requested review from leifdenby, sadamov and TomasLandelius October 21, 2024 15:44

joeloskarsson self-assigned this Oct 21, 2024

sadamov approved these changes Oct 22, 2024

View reviewed changes

sadamov added the enhancement New feature or request label Oct 22, 2024

joeloskarsson added 11 commits November 26, 2024 16:09

Add check for number of nodes in test with irregular coords

05a0cc6

Update docs line on square meshes

339feff

Reference lat-lon notebook in coordinate section

6b281bf

Change projection spec to use pyproj crs:s

a525818

Adjust test to crs arguments

8733359

Fix linting

c95f023

Update docs to crs change

bcaf1e1

Add cartopy dependency to visualization group

8e3c1cb

Merge branch 'general_coordinates' into decoding_mask

541054e

Sort nodes by id before pyg conversion

169ea48

Introduce option to return graph components directly, used through kw…

a6b6137

…args in archetypes

Merge branch 'main' into decoding_mask

7576b43

Add explanation of **kwargs to archetype docstrings

ae90eb8

joeloskarsson mentioned this pull request Dec 3, 2024

Separate interior state and boundary forcing to only predict state mllam/neural-lam#93

Draft

20 tasks

leifdenby added 2 commits December 6, 2024 21:46

add test ensuring unchanged grid-indecies /w decode mask

ee8601e

add test and example notebook

e178a0b

leifdenby reviewed Jan 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a decoding mask option to only include subset of grid nodes in m2g #34

Add a decoding mask option to only include subset of grid nodes in m2g #34

joeloskarsson commented Oct 21, 2024 •

edited by sadamov

Loading

sadamov left a comment

leifdenby commented Nov 28, 2024

joeloskarsson commented Nov 29, 2024

joeloskarsson commented Nov 29, 2024 •

edited

Loading

leifdenby commented Dec 5, 2024 •

edited

Loading

leifdenby commented Dec 5, 2024

joeloskarsson commented Dec 5, 2024

leifdenby commented Dec 9, 2024

joeloskarsson commented Dec 10, 2024

leifdenby commented Jan 8, 2025 •

edited

Loading

joeloskarsson commented Jan 8, 2025

leifdenby commented Jan 9, 2025

joeloskarsson commented Jan 9, 2025 •

edited

Loading

leifdenby commented Jan 9, 2025

leifdenby left a comment

leifdenby Nov 28, 2024

leifdenby Jan 9, 2025

leifdenby Jan 9, 2025

leifdenby Jan 9, 2025

leifdenby Jan 9, 2025

leifdenby Jan 9, 2025

leifdenby Jan 9, 2025

leifdenby Jan 9, 2025

	if return_components:
	if return_components:
	# Because merging to a single graph and then splitting again leads to changes in node indexing when converting to `pyg.Data` objects (this in part is due to the to `m2g` and `g2m` having a different set of grid nodes) the ability to return the graph components (`g2m`, `m2m` and `m2g`) has been added here. See https://github.com/mllam/weather-model-graphs/pull/34#issuecomment-2507980752 for details

	sorted according to their id
	sorted according to their label value

Add a decoding mask option to only include subset of grid nodes in m2g #34

Are you sure you want to change the base?

Add a decoding mask option to only include subset of grid nodes in m2g #34

Conversation

joeloskarsson commented Oct 21, 2024 • edited by sadamov Loading

Describe your changes

Type of change

Checklist before requesting a review

Checklist for reviewers

Author checklist after completed review

Checklist for assignee

sadamov left a comment

Choose a reason for hiding this comment

leifdenby commented Nov 28, 2024

joeloskarsson commented Nov 29, 2024

joeloskarsson commented Nov 29, 2024 • edited Loading

leifdenby commented Dec 5, 2024 • edited Loading

leifdenby commented Dec 5, 2024

joeloskarsson commented Dec 5, 2024

leifdenby commented Dec 9, 2024

joeloskarsson commented Dec 10, 2024

leifdenby commented Jan 8, 2025 • edited Loading

joeloskarsson commented Jan 8, 2025

leifdenby commented Jan 9, 2025

joeloskarsson commented Jan 9, 2025 • edited Loading

leifdenby commented Jan 9, 2025

leifdenby left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joeloskarsson commented Oct 21, 2024 •

edited by sadamov

Loading

joeloskarsson commented Nov 29, 2024 •

edited

Loading

leifdenby commented Dec 5, 2024 •

edited

Loading

leifdenby commented Jan 8, 2025 •

edited

Loading

joeloskarsson commented Jan 9, 2025 •

edited

Loading