Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the unnecessary SDG class #64

Merged
merged 1 commit into from
Jul 29, 2024
Merged

Conversation

markmc
Copy link
Contributor

@markmc markmc commented Jul 2, 2024

Part of #61 and instructlab/dev-docs#113

A pipeline chains together a sequence of blocks, and an SDG chains together a sequence of pipelines. There is no need for this additional layer - we can construct a pipeline with the full sequence of blocks, and not chain pipelines together.

Check out how the API is used in generate_data.py to get a sense of the improvement

@russellb
Copy link
Member

russellb commented Jul 2, 2024

Assuming e2e passes, I am definitely supportive of this general direction. There's some discussion on #61 with @shivchander, so I'm not going to approve/merge just yet to give time for his input.

@russellb russellb requested a review from shivchander July 2, 2024 16:14
@mergify mergify bot added the needs-rebase label Jul 2, 2024
Copy link
Contributor

mergify bot commented Jul 2, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @markmc please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@markmc markmc changed the title Tweak the API design Remove the unnecessary SDG class Jul 5, 2024
@mergify mergify bot removed the needs-rebase label Jul 5, 2024
@markmc
Copy link
Contributor Author

markmc commented Jul 5, 2024

Assuming e2e passes, I am definitely supportive of this general direction. There's some discussion on #61 with @shivchander, so I'm not going to approve/merge just yet to give time for his input.

The discussion has stalled for now

I've reduced the scope of this PR to just the SDG thing, since the rest is now incorporated into #86

Copy link
Member

@russellb russellb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kicked off the CI job that runs the full pipeline.

Otherwise, this looks like a nice simplification to me.

Copy link

github-actions bot commented Jul 8, 2024

E2E (NVIDIA A10G x4 - full pipeline) workflow launched on this PR: View run

Copy link

github-actions bot commented Jul 8, 2024

e2e workflow failed on this PR: View run, please investigate.

Copy link
Member

@hickeyma hickeyma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivchander you raised reasons for maintaining the SDG class in #61 (comment). What do you think of the changes in this PR?

@markmc
Copy link
Contributor Author

markmc commented Jul 9, 2024

See also instructlab/dev-docs#109

@hickeyma
Copy link
Member

hickeyma commented Jul 9, 2024

See also instructlab/dev-docs#109

@markmc do you mean instructlab/dev-docs#113 instead?

@markmc
Copy link
Contributor Author

markmc commented Jul 9, 2024

See also instructlab/dev-docs#109

@markmc do you mean instructlab/dev-docs#113 instead?

Doh, yes - thank you

Copy link
Contributor

mergify bot commented Jul 9, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @markmc please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 9, 2024
Copy link
Member

@hickeyma hickeyma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markmc Is this blocked waiting on the design proposal to progress?

@markmc
Copy link
Contributor Author

markmc commented Jul 18, 2024

@markmc Is this blocked waiting on the design proposal to progress?

Thanks for the reminder. I've rebased

I think the only blocker is that @shivchander has not yet acked my YAGNI argument

@markmc markmc added this to the 0.1.3 milestone Jul 19, 2024
@mergify mergify bot added the needs-rebase label Jul 19, 2024
Copy link
Contributor

mergify bot commented Jul 19, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @markmc please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@russellb russellb modified the milestones: 0.1.3, 0.2.0 Jul 22, 2024
@mergify mergify bot added the testing Relates to testing label Jul 22, 2024
@mergify mergify bot removed the needs-rebase label Jul 22, 2024
@markmc markmc modified the milestones: 0.2.0, 0.2.1 Jul 23, 2024
@shivchander
Copy link
Member

shivchander commented Jul 25, 2024

On the research side, we are actively working on developing complex flows that chain pipelines in a DAG fashion. Our goal with the SDG class wass to serve as an abstract wrapper capable of managing these complex flows.

Could this be renamed to something better? Yes. But I and the research team are not in favor of removing this, but I will not be a blocker on this PR, If there's a consensus on moving forward. Just wanted to give a heads-up that at the pace research moves, we might need something which can handle this sooner.

@markmc
Copy link
Contributor Author

markmc commented Jul 25, 2024

On the research side, we are actively working on developing complex flows that chain pipelines in a DAG fashion. Our goal with the SDG class wass to serve as an abstract wrapper capable of managing these complex flows.

Could this be renamed to something better? Yes. But I and the research team are not in favor of removing this, but I will not be a blocker on this PR, If there's a consensus on moving forward. Just wanted to give a heads-up that at the pace research moves, we might need something which can handle this sooner.

It's very difficult to remove an API once a library is starting to be used in the wild - and renaming an API is effectively removing an API. Library API maintainers need to be very careful about breaking downstream users. Removing an API generally involves a long deprecation period, to make sure users have time to migrate away from it.

I do not believe something called "SDG" will live on as the name of any abstraction in this library long-term, and I do not want to make it difficult to remove for no reason

Adding whatever the new abstraction should be is super easy, and made no more difficult by removing SDG

@markmc markmc modified the milestones: 0.2.1, 0.2.2 Jul 26, 2024
Copy link
Contributor

mergify bot commented Jul 26, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @markmc please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@markmc
Copy link
Contributor Author

markmc commented Jul 27, 2024

It's very difficult to remove an API once a library is starting to be used in the wild - and renaming an API is effectively removing an API. Library API maintainers need to be very careful about breaking downstream users. Removing an API generally involves a long deprecation period, to make sure users have time to migrate away from it.

We will release 0.2.2 or 0.2.3 any day now, which will be used by instructlab 0.18.0, and from that point on I expect we'll be much more cautious about incompatible API changes. For that reason, I'm going to go ahead with this change despite objections, on the basis that it is trivial to restore this abstraction when we need it.

@markmc markmc modified the milestones: 0.2.2, 0.2.3 Jul 28, 2024
Copy link
Contributor

mergify bot commented Jul 29, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @markmc please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 29, 2024
A pipeline chains together a sequence of blocks, and an SDG chains together
a sequence of pipelines. There is no need for this additional layer - we
can construct a pipeline with the full sequence of blocks, and not chain
pipelines together.

Signed-off-by: Mark McLoughlin <[email protected]>
@mergify mergify bot removed the needs-rebase label Jul 29, 2024
@markmc markmc merged commit 2dcbec7 into instructlab:main Jul 29, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Relates to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants