Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jgf: refactor to use shared functions and fix containment #76

Merged
merged 6 commits into from
Jun 27, 2024
Merged

Conversation

vsoch
Copy link
Member

@vsoch vsoch commented May 28, 2024

Problem: the containment paths are currently not always set, and each resource type has a separate function.
Solution: create a shared node generation function, a "ResourceCounter" that uses a common resource counter, where the counter manages the global and resource-specific counts, and then can be passed to a common node generation function!

This scoped commit includes:

  • refactor of those JGF functions to use a common base (taking the resource counter, size and unit), will address a point in Post refactor changes needed #71
  • removing resources that are not currently present in the graph (rack and socket)
  • ensuring that we do not recreate the subnet since we loop through nodes and could hit the same one twice
  • (coming soon, haven't commit yet) a test to verify the JGF output

I am also removing the NFD features (#75) because I think they are changed and might lead to error if someone has them, and I think it will be good to assess which we want/need (there are a lot!) and then add them back strategically.

I tried really hard to scope commits, but in practice it was like "a few little tiny things" and then the bulk of work I didn't know how to break into smaller pieces, so it's one chunk.

I could add more to this PR to further work on the logic of the ResourceCounter (for example, I think we should be counting things using it and not outside of it) but I don't want to make it too big for easier review.

I'll mark this ready when it's ready for review, and add a few more notes about questions/discussion.

vsoch added 5 commits May 27, 2024 07:13
Problem: The "JGF" part of fluxjgf should be capitalized to refer
to "Json Graph Format" and for easier reading for the developer.
Solution: Rename "Fluxjgf to FluxJGF"

Signed-off-by: vsoch <[email protected]>
Problem: The convention for creating new structs is typically
to use "New$Name" and return a reference or copy to it.
Solution: Rename InitJGF to NewFluxJGF to mirror that
practice, making it clear we get back a "new flux JGF"
and the function is not initializing something that
already exists.

Signed-off-by: vsoch <[email protected]>
Problem: we want to have end to end (e2e) and regular go tests.
Solution: rename the test.yaml to e2e-test.yaml, and prepare to
add simple tests for next round of changes (testify package)

Signed-off-by: vsoch <[email protected]>
Problem: the containment paths are currently not set,
and each resource type has a separate function.
Solution: create a shared node generation function
that uses a common resource counter, where the counter
manages the global and resource-specific counts. This
scoped commit includes refactor of those JGF functions,
which also means removing resources that are not present
in the graph (e.g., rack and socket) and ensuring that
we do not recreate the subnet since we loop through
nodes (and might hit the same one twice). I am also
removing the NFD features because I think they are
changed and might lead to error if someone has them.
We should assess which of them we want/need and add
them back strategically. I have a local test but want
to push this first to give a look over / run current
tests to determine if other changes are needed.

Signed-off-by: vsoch <[email protected]>
Problem: we will want to have scoped tests for fluence.
Solution: to start, add a simple test file for JGF (the
content of this current PR) and have an easy way to run
it with "make test" from that directory. Currently this
does not validate most of the structure - I would like
to review the output, discuss a testing strategy, and then
update the commit here to reflect that decision.

Signed-off-by: vsoch <[email protected]>
@vsoch
Copy link
Member Author

vsoch commented May 28, 2024

@cmisale and @milroy - the changes here will tweak the Flux JGF a tiny bit so each has a containment path with (I think) correct indices. There of course could be bugs, so I'd like to look it over together (and myself again). I tried my best to keep the commits scoped, but in practice there is one large one and a few smaller ones. I will hopefully get better at this.

What I'd like to suggest we do next is review the output of the tests - specifically here (click "Run Tests" and you'll see JGF output as I update the graph) where I print a generated JGF (not from a cluster, but from the test) and talk about the changes and if the structure is OK. Then we can decide on a testing strategy, I can implement and finish, and move forward with review. After this PR I'd like to do separate PRs in the following order:

  1. Rename / reorganize utils and have the resources come from the group (instead of a representative pod)
  2. Add in additional resources we are interested in (and consider units)
  3. Add back / think through NFD

I probably can't think ahead more than three things :) I'm also going to be scoping out a design for our turducken fluence / flux-core approach, but I need to finish some other work first.

I hope you had a good memorial day weekend (I'm still not tired of using these emoji 🍓 ☁️ 🫐 )

Copy link
Member

@milroy milroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks really good. Thank you!

We should discuss the index generation to make sure I'm interpreting the behavior correctly and to make sure it's what we want.

src/fluence/fluxion/fluxion.go Show resolved Hide resolved
src/fluence/jgf/jgf.go Outdated Show resolved Hide resolved
src/fluence/jgf/jgf.go Outdated Show resolved Hide resolved
src/fluence/jgf/jgf.go Show resolved Hide resolved
src/fluence/utils/utils.go Outdated Show resolved Hide resolved
@vsoch
Copy link
Member Author

vsoch commented Jun 20, 2024

Thanks for the review! I'll wait to push (tiny changes so far) until we come to consensus on the above.

Problem: the current strategy to derive an index is
scoped to a resource globally across the graph.
Solution: instead, provide a direct index counter
for each new resource to ensure it is scoped to
the parent

Signed-off-by: vsoch <[email protected]>
@vsoch vsoch requested a review from milroy June 21, 2024 21:09
Copy link
Member

@milroy milroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great; thanks!

@vsoch vsoch merged commit ae68c19 into main Jun 27, 2024
8 checks passed
@vsoch vsoch deleted the jgf-updates branch June 27, 2024 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants