Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make SynthSkillsFlow honor the num_iters parameter #82

Merged
merged 1 commit into from
Jul 6, 2024

Conversation

russellb
Copy link
Member

@russellb russellb commented Jul 4, 2024

The base Flow class takes a num_iters parameter. This is used by
the ilab CLI to adjust how much data is generated by a given run.

The old value for this parameter in the CLI was 100, so while testing
this in the short term, it should be specified explicitly like this:

ilab data generate --num-instructions 30

When we release a version of this library that includes this effective
rewrite, the default value will be 30 and the option and its
description will better reflect the new behavior.

ilab data generate --sdg-scale-factor 30

More detail about how this is exposed via the CLI can be found in this
PR:

instructlab/instructlab#1570

Honoring this parameter for the full pipeline will be used immediately
in CI integration, where we're testing that the code can run
successfully, but want to do so as quickly as is reasonable.
Currently, E2E always runs with this setting set to 1 for speed
purposes.

Signed-off-by: Russell Bryant [email protected]

The base `Flow` class takes a `num_iters` parameter. This is used by
the `ilab` CLI to adjust how much data is generated by a given run.

The old value for this parameter in the CLI was 100, so while testing
this in the short term, it should be specified explicitly like this:

    ilab data generate --num-instructions 30

When we release a version of this library that includes this effective
rewrite, the default value will be 30 and the option and its
description will better reflect the new behavior.

    ilab data generate --sdg-scale-factor 30

More detail about how this is exposed via the CLI can be found in this
PR:

instructlab/instructlab#1570

Honoring this parameter for the full pipeline will be used immediately
in CI integration, where we're testing that the code can run
successfully, but want to do so as quickly as is reasonable.
Currently, E2E always runs with this setting set to 1 for speed
purposes.

Signed-off-by: Russell Bryant <[email protected]>
@markmc
Copy link
Contributor

markmc commented Jul 4, 2024

lgtm 👍

@russellb
Copy link
Member Author

russellb commented Jul 6, 2024

This is a trivial fix and it's causing CI problems, so I'm going to merge it with @markmc 's review

@russellb russellb merged commit 6b82bad into instructlab:main Jul 6, 2024
11 checks passed
@russellb russellb added this to the 0.1.0 milestone Jul 8, 2024
markmc added a commit to markmc/sdg that referenced this pull request Jul 11, 2024
Two pipelines include an LLMBlock which use `{num_samples}` in their
instructions to the teacher model. There needs to be some way to
configure the LLMBlock so that `num_samples` will be included, but
as per instructlab#82 (commit a01b04e) the value of `num_samples` should be
based on the `num_instructions_to_generate` parameter.

Signed-off-by: Mark McLoughlin <[email protected]>
jwm4 pushed a commit to jwm4/sdg that referenced this pull request Dec 13, 2024
Proposal for new repository for UI work
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants