jgf: refactor to use shared functions and fix containment #76

vsoch · 2024-05-28T06:35:28Z

Problem: the containment paths are currently not always set, and each resource type has a separate function.
Solution: create a shared node generation function, a "ResourceCounter" that uses a common resource counter, where the counter manages the global and resource-specific counts, and then can be passed to a common node generation function!

This scoped commit includes:

refactor of those JGF functions to use a common base (taking the resource counter, size and unit), will address a point in Post refactor changes needed #71
removing resources that are not currently present in the graph (rack and socket)
ensuring that we do not recreate the subnet since we loop through nodes and could hit the same one twice
(coming soon, haven't commit yet) a test to verify the JGF output

I am also removing the NFD features (#75) because I think they are changed and might lead to error if someone has them, and I think it will be good to assess which we want/need (there are a lot!) and then add them back strategically.

I tried really hard to scope commits, but in practice it was like "a few little tiny things" and then the bulk of work I didn't know how to break into smaller pieces, so it's one chunk.

I could add more to this PR to further work on the logic of the ResourceCounter (for example, I think we should be counting things using it and not outside of it) but I don't want to make it too big for easier review.

I'll mark this ready when it's ready for review, and add a few more notes about questions/discussion.

Problem: The "JGF" part of fluxjgf should be capitalized to refer to "Json Graph Format" and for easier reading for the developer. Solution: Rename "Fluxjgf to FluxJGF" Signed-off-by: vsoch <[email protected]>

Problem: The convention for creating new structs is typically to use "New$Name" and return a reference or copy to it. Solution: Rename InitJGF to NewFluxJGF to mirror that practice, making it clear we get back a "new flux JGF" and the function is not initializing something that already exists. Signed-off-by: vsoch <[email protected]>

Problem: we want to have end to end (e2e) and regular go tests. Solution: rename the test.yaml to e2e-test.yaml, and prepare to add simple tests for next round of changes (testify package) Signed-off-by: vsoch <[email protected]>

Problem: the containment paths are currently not set, and each resource type has a separate function. Solution: create a shared node generation function that uses a common resource counter, where the counter manages the global and resource-specific counts. This scoped commit includes refactor of those JGF functions, which also means removing resources that are not present in the graph (e.g., rack and socket) and ensuring that we do not recreate the subnet since we loop through nodes (and might hit the same one twice). I am also removing the NFD features because I think they are changed and might lead to error if someone has them. We should assess which of them we want/need and add them back strategically. I have a local test but want to push this first to give a look over / run current tests to determine if other changes are needed. Signed-off-by: vsoch <[email protected]>

Problem: we will want to have scoped tests for fluence. Solution: to start, add a simple test file for JGF (the content of this current PR) and have an easy way to run it with "make test" from that directory. Currently this does not validate most of the structure - I would like to review the output, discuss a testing strategy, and then update the commit here to reflect that decision. Signed-off-by: vsoch <[email protected]>

vsoch · 2024-05-28T07:00:02Z

@cmisale and @milroy - the changes here will tweak the Flux JGF a tiny bit so each has a containment path with (I think) correct indices. There of course could be bugs, so I'd like to look it over together (and myself again). I tried my best to keep the commits scoped, but in practice there is one large one and a few smaller ones. I will hopefully get better at this.

What I'd like to suggest we do next is review the output of the tests - specifically here (click "Run Tests" and you'll see JGF output as I update the graph) where I print a generated JGF (not from a cluster, but from the test) and talk about the changes and if the structure is OK. Then we can decide on a testing strategy, I can implement and finish, and move forward with review. After this PR I'd like to do separate PRs in the following order:

Rename / reorganize utils and have the resources come from the group (instead of a representative pod)
Add in additional resources we are interested in (and consider units)
Add back / think through NFD

I probably can't think ahead more than three things :) I'm also going to be scoping out a design for our turducken fluence / flux-core approach, but I need to finish some other work first.

I hope you had a good memorial day weekend (I'm still not tired of using these emoji 🍓 ☁️ 🫐 )

milroy

This generally looks really good. Thank you!

We should discuss the index generation to make sure I'm interpreting the behavior correctly and to make sure it's what we want.

src/fluence/fluxion/fluxion.go

src/fluence/jgf/jgf.go

src/fluence/utils/utils.go

vsoch · 2024-06-20T13:49:40Z

Thanks for the review! I'll wait to push (tiny changes so far) until we come to consensus on the above.

Problem: the current strategy to derive an index is scoped to a resource globally across the graph. Solution: instead, provide a direct index counter for each new resource to ensure it is scoped to the parent Signed-off-by: vsoch <[email protected]>

milroy

This looks great; thanks!

vsoch added 5 commits May 27, 2024 07:13

chore: rename of Fluxjgf class.

1e83fee

Problem: The "JGF" part of fluxjgf should be capitalized to refer to "Json Graph Format" and for easier reading for the developer. Solution: Rename "Fluxjgf to FluxJGF" Signed-off-by: vsoch <[email protected]>

test: rename test to be e2e

3ab00ad

Problem: we want to have end to end (e2e) and regular go tests. Solution: rename the test.yaml to e2e-test.yaml, and prepare to add simple tests for next round of changes (testify package) Signed-off-by: vsoch <[email protected]>

vsoch force-pushed the jgf-updates branch from ca0607f to a3a4f03 Compare May 28, 2024 06:49

milroy reviewed Jun 20, 2024

View reviewed changes

src/fluence/fluxion/fluxion.go Show resolved Hide resolved

src/fluence/jgf/jgf.go Outdated Show resolved Hide resolved

src/fluence/jgf/jgf.go Outdated Show resolved Hide resolved

src/fluence/jgf/jgf.go Show resolved Hide resolved

src/fluence/utils/utils.go Outdated Show resolved Hide resolved

vsoch requested a review from milroy June 21, 2024 21:09

milroy approved these changes Jun 27, 2024

View reviewed changes

vsoch merged commit ae68c19 into main Jun 27, 2024
8 checks passed

vsoch deleted the jgf-updates branch June 27, 2024 11:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jgf: refactor to use shared functions and fix containment #76

jgf: refactor to use shared functions and fix containment #76

vsoch commented May 28, 2024

vsoch commented May 28, 2024 •

edited

Loading

milroy left a comment

vsoch commented Jun 20, 2024

milroy left a comment

jgf: refactor to use shared functions and fix containment #76

jgf: refactor to use shared functions and fix containment #76

Conversation

vsoch commented May 28, 2024

vsoch commented May 28, 2024 • edited Loading

milroy left a comment

Choose a reason for hiding this comment

vsoch commented Jun 20, 2024

milroy left a comment

Choose a reason for hiding this comment

vsoch commented May 28, 2024 •

edited

Loading