-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider updating API design for clarity #61
Comments
While I understand the intent to simplify the system, I believe removing the SDG might not be the best approach for the following reasons:
Currently we only support linear pipeline chaining (and each pipeline only supports linear chaining of the block). But in future we want to move towards supporting composition of complex ways to compose (any DAGs) |
ds = Dataset.from_list(samples)
block_params = BlockConfigParams(client, "mixtral", teacher_model)
flow_params = FlowParams(client, "mixtral", teacher_model)
block_configs = MMLUBenchFlow(flow_params).render()
block_configs.extend(SynthKnowledgeFlow(flow_params).render()) I like this proposal to enhance the clarity and convenience |
I can totally see that, but consider the YAGNI principle
It's very easy to add this concept later when we actually add support for more complex compositions. And at that point, we can consider an API design in the context of a more concrete proposal. Right now, this concept adds nothing but confusion - e.g. should you chain together Also, I don't think |
Thank you. Just a note - the |
In #86, flows are now YAML files with
In #86, this is now So it now looks like:
|
As #86 develops further, I think it's becoming increasingly obvious that the "Flow" noun adds little but confusion to this abstraction Consider:
In #86, "flows" are now simply config files with a sequence of block configs that can be used to construct a pipeline - they are simply a pipeline description, and we can simply call them that I think it would be best to bite the bullet now in the early stages of the project and pare the nouns back to |
See instructlab/sdg#61 for some of the discussion on this, but I've tried to summarize those discussions in the design doc. Signed-off-by: Mark McLoughlin <[email protected]>
Drafted a design doc for this - see instructlab/dev-docs#113 |
See instructlab/sdg#61 for some of the discussion on this, but I've tried to summarize those discussions in the design doc. Signed-off-by: Mark McLoughlin <[email protected]>
All done now. |
Consider the following:
or:
Consider the nouns:
generate()
method transforms an input dataset and returns an output datasetgenerate()
method in which it instantiates and invokes blocks in turn, passing the input dataset and collecting the outputgenerate()
method calls pipelines in turnProposals:
SDG
- we don't need bothSDG
andPipeline
sincePipeline
can already do everythingSDG
can doFlow
as a block config template - it would be more clear if we reinforced the idea that a "flow" is a template of a block config sequence - arender()
method make sense to me, and an extensibleparams
object for the common case of instantiating multiple flowsPipeline.from_flows()
convenience class method to Pipeline that knows how to render block configs from a sequence of flowsSo we could have e.g.
or:
or:
Resolving this issue would require an update to the design doc and a code change
It would definitely be better to do this before users of the API proliferate
The text was updated successfully, but these errors were encountered: