Skip to content

Commit

Permalink
Make SynthSkillsFlow honor the num_iters parameter
Browse files Browse the repository at this point in the history
The base `Flow` class takes a `num_iters` parameter. This is used by
the `ilab` CLI to adjust how much data is generated by a given run.

The old value for this parameter in the CLI was 100, so while testing
this in the short term, it should be specified explicitly like this:

    ilab data generate --num-instructions 30

When we release a version of this library that includes this effective
rewrite, the default value will be 30 and the option and its
description will better reflect the new behavior.

    ilab data generate --sdg-scale-factor 30

More detail about how this is exposed via the CLI can be found in this
PR:

instructlab/instructlab#1570

Honoring this parameter for the full pipeline will be used immediately
in CI integration, where we're testing that the code can run
successfully, but want to do so as quickly as is reasonable.
Currently, E2E always runs with this setting set to 1 for speed
purposes.

Signed-off-by: Russell Bryant <[email protected]>
  • Loading branch information
russellb committed Jul 4, 2024
1 parent 2788384 commit a01b04e
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/instructlab/sdg/default_flows.py
Original file line number Diff line number Diff line change
Expand Up @@ -286,7 +286,7 @@ def get_flow(self) -> list:
"output_cols": ["question"],
"batch_kwargs": {
"num_procs": 8,
"num_samples": 30,
"num_samples": self.num_iters,
"batched": self.batched,
},
},
Expand Down

0 comments on commit a01b04e

Please sign in to comment.