Make SynthSkillsFlow honor the num_iters parameter #82

russellb · 2024-07-04T01:50:21Z

The base Flow class takes a num_iters parameter. This is used by
the ilab CLI to adjust how much data is generated by a given run.

The old value for this parameter in the CLI was 100, so while testing
this in the short term, it should be specified explicitly like this:

ilab data generate --num-instructions 30

When we release a version of this library that includes this effective
rewrite, the default value will be 30 and the option and its
description will better reflect the new behavior.

ilab data generate --sdg-scale-factor 30

More detail about how this is exposed via the CLI can be found in this
PR:

instructlab/instructlab#1570

Honoring this parameter for the full pipeline will be used immediately
in CI integration, where we're testing that the code can run
successfully, but want to do so as quickly as is reasonable.
Currently, E2E always runs with this setting set to 1 for speed
purposes.

Signed-off-by: Russell Bryant [email protected]

The base `Flow` class takes a `num_iters` parameter. This is used by the `ilab` CLI to adjust how much data is generated by a given run. The old value for this parameter in the CLI was 100, so while testing this in the short term, it should be specified explicitly like this: ilab data generate --num-instructions 30 When we release a version of this library that includes this effective rewrite, the default value will be 30 and the option and its description will better reflect the new behavior. ilab data generate --sdg-scale-factor 30 More detail about how this is exposed via the CLI can be found in this PR: instructlab/instructlab#1570 Honoring this parameter for the full pipeline will be used immediately in CI integration, where we're testing that the code can run successfully, but want to do so as quickly as is reasonable. Currently, E2E always runs with this setting set to 1 for speed purposes. Signed-off-by: Russell Bryant <[email protected]>

markmc · 2024-07-04T07:39:36Z

lgtm 👍

russellb · 2024-07-06T23:19:38Z

This is a trivial fix and it's causing CI problems, so I'm going to merge it with @markmc 's review

Two pipelines include an LLMBlock which use `{num_samples}` in their instructions to the teacher model. There needs to be some way to configure the LLMBlock so that `num_samples` will be included, but as per instructlab#82 (commit a01b04e) the value of `num_samples` should be based on the `num_instructions_to_generate` parameter. Signed-off-by: Mark McLoughlin <[email protected]>

Proposal for new repository for UI work

russellb merged commit 6b82bad into instructlab:main Jul 6, 2024
11 checks passed

russellb mentioned this pull request Jul 7, 2024

Add a YAML based file format for pipelines #86

Merged

russellb added this to the 0.1.0 milestone Jul 8, 2024

markmc mentioned this pull request Jul 11, 2024

Add add_num_samples to LLMBlock config #121

Closed

jwm4 pushed a commit to jwm4/sdg that referenced this pull request Dec 13, 2024

Merge pull request instructlab#82 from vishnoianil/project-ui

2429eda

Proposal for new repository for UI work

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make SynthSkillsFlow honor the num_iters parameter #82

Make SynthSkillsFlow honor the num_iters parameter #82

russellb commented Jul 4, 2024

markmc commented Jul 4, 2024

russellb commented Jul 6, 2024

Make SynthSkillsFlow honor the num_iters parameter #82

Make SynthSkillsFlow honor the num_iters parameter #82

Conversation

russellb commented Jul 4, 2024

markmc commented Jul 4, 2024

russellb commented Jul 6, 2024