Releases: instructlab/sdg
Releases · instructlab/sdg
v0.6.3
SDG v0.6.3
Fixes
- The max version constraint of PyTorch in our requirements file was raised so that we don't prevent SDG users from using it PyTorch 2.5.
All Changes
- chore!: Update PyTorch to 2.5 (backport #465) by @mergify in #469
- Update CHANGELOG.md for v0.6.3 (backport #473) by @mergify in #474
Full Changelog: v0.6.2...v0.6.3
v0.6.2
SDG v0.6.2
Fixes
- Fixed a bug in our version specification of
docling
anddocling_parse
dependencies that were causing new installs of InstructLab to pull in incompatible versions of these. We also fixed a similar bug in themypy
dependency, but that one only impacts developers of SDG as opposed to users of InstructLab.
All Changes
- Move AWS_REGION from using secret to var (backport #422) by @mergify in #438
- fix: Restrict
docling
library versions to resolve dependency issues + updatemypy
linting packages (backport #434) by @mergify in #437 - Update CHANGELOG.md for release v0.6.2 (backport #440) by @mergify in #441
Full Changelog: v0.6.1...v0.6.2
v0.6.1
v0.6.0
SDG v0.6.0
What's Changed
- fix: formatting error by @RobotSail in #378
- Prefer tesserocr over easyocr, if available by @bbrowning in #369
- ci: add large-size E2E CI job by @nathan-weinberg in #380
- Add Release Strategy Document by @khaledsulayman in #381
- Docling models path by @aakankshaduggal in #362
- Check for tokenizer in downloaded models directory by @khaledsulayman in #364
- fix: upsample the phase10 knowledge dataset by @RobotSail in #377
- build(deps): bump DavidAnson/markdownlint-cli2-action from 17.0.0 to 18.0.0 by @dependabot in #386
- Delete .gitattributes by @khaledsulayman in #393
New Contributors
- @RobotSail made their first contribution in #378
Full Changelog: v0.5.0...v0.6.0
v0.3.3
What's Changed
- Prepare release-v0.3 branch for backports by @bbrowning in #371
- Run the simple pipeline on small runners by @bbrowning in #372
- Data mix fix (backport #366) by @mergify in #368
Full Changelog: v0.3.2...v0.3.3
v0.5.0
v0.5.0
What's Changed
- build(deps): bump actions/cache from 4.1.0 to 4.1.1 by @dependabot in #300
- build(deps): bump rojopolis/spellcheck-github-actions from 0.42.0 to 0.43.0 by @dependabot in #299
- build(deps): bump actions/checkout from 4.2.0 to 4.2.1 by @dependabot in #298
- chore: rename 'basic-workflow-tests' to 'e2e-custom' by @nathan-weinberg in #306
- fix: change "group" to "tag" for mmlu_branch task config by @alimaredia in #305
- fix: remove stop token from mixtral by @cdoern in #310
- ci: update small E2E job to align with CLI and Training by @nathan-weinberg in #317
- ci: update medium job to run as PR check by @nathan-weinberg in #318
- build(deps): bump rojopolis/spellcheck-github-actions from 0.43.0 to 0.43.1 by @dependabot in #314
- fix: medium E2E CI job was missing HF_TOKEN by @nathan-weinberg in #319
- build(deps): bump actions/cache from 4.1.1 to 4.1.2 by @dependabot in #320
- ci: use org variable for AWS EC2 AMI in E2E CI jobs by @nathan-weinberg in #322
- ci: convert med E2E CI job to L4 GPU by @nathan-weinberg in #325
- build(deps): bump rojopolis/spellcheck-github-actions from 0.43.1 to 0.44.0 by @dependabot in #326
- build(deps): bump actions/setup-python from 5.2.0 to 5.3.0 by @dependabot in #323
- build(deps): bump pypa/gh-action-pypi-publish from 1.10.3 to 1.11.0 by @dependabot in #327
- build(deps): bump actions/checkout from 4.2.1 to 4.2.2 by @dependabot in #321
- build(deps): bump machulav/ec2-github-runner from 2.3.6 to 2.3.7 by @dependabot in #328
- build(deps): bump hynek/build-and-inspect-python-package from 2.9.0 to 2.10.0 by @dependabot in #329
- build(deps): bump rhysd/actionlint from 1.7.3 to 1.7.4 in /.github/workflows by @dependabot in #332
- build(deps): bump pypa/gh-action-pypi-publish from 1.11.0 to 1.12.0 by @dependabot in #337
- build(deps): bump rojopolis/spellcheck-github-actions from 0.44.0 to 0.45.0 by @dependabot in #338
- build(deps): bump pypa/gh-action-pypi-publish from 1.12.0 to 1.12.2 by @dependabot in #342
- Integrate Context-Aware Chunking and PDF Support by @khaledsulayman in #284
- feat: parametrize system prompt by @jaideepr97 in #339
- feat: support converting messages datasets into multiple pre-training formats by @jaideepr97 in #341
- Move to Docling v2 APIs by @bbrowning in #347
- feat: expose max_num_tokens as configurable by @cdoern in #340
- Remove unnecessary requirement for qna.yaml in ContextAwareChunker by @khaledsulayman in #351
- Upgrade docling, expand chunking testing by @bbrowning in #349
- Don't attempt batching with InstructLab's llama-cpp-python by @bbrowning in #358
- Consolidate test sample documents into one subdir by @bbrowning in #356
- Move a spurious print to a debug log message by @bbrowning in #359
- Only use CPU for the docling OCR models by @bbrowning in #361
- Data mix fix by @aakankshaduggal in #366
New Contributors
- @alimaredia made their first contribution in #305
Full Changelog: v0.4.2...v0.5.0
v0.5.0a2
What's Changed
- build(deps): bump actions/checkout from 4.2.1 to 4.2.2 by @dependabot in #321
- build(deps): bump machulav/ec2-github-runner from 2.3.6 to 2.3.7 by @dependabot in #328
- build(deps): bump hynek/build-and-inspect-python-package from 2.9.0 to 2.10.0 by @dependabot in #329
- build(deps): bump rhysd/actionlint from 1.7.3 to 1.7.4 in /.github/workflows by @dependabot in #332
- build(deps): bump pypa/gh-action-pypi-publish from 1.11.0 to 1.12.0 by @dependabot in #337
- build(deps): bump rojopolis/spellcheck-github-actions from 0.44.0 to 0.45.0 by @dependabot in #338
- build(deps): bump pypa/gh-action-pypi-publish from 1.12.0 to 1.12.2 by @dependabot in #342
- Integrate Context-Aware Chunking and PDF Support by @khaledsulayman in #284
- feat: parametrize system prompt by @jaideepr97 in #339
- feat: support converting messages datasets into multiple pre-training formats by @jaideepr97 in #341
- Move to Docling v2 APIs by @bbrowning in #347
- feat: expose max_num_tokens as configurable by @cdoern in #340
- Remove unnecessary requirement for qna.yaml in ContextAwareChunker by @khaledsulayman in #351
- Upgrade docling, expand chunking testing by @bbrowning in #349
Full Changelog: v0.5.0a1...v0.5.0a2
v0.5.0a1
v0.5.0a1
What's Changed
- build(deps): bump actions/cache from 4.1.0 to 4.1.1 by @dependabot in #300
- build(deps): bump rojopolis/spellcheck-github-actions from 0.42.0 to 0.43.0 by @dependabot in #299
- build(deps): bump actions/checkout from 4.2.0 to 4.2.1 by @dependabot in #298
- chore: rename 'basic-workflow-tests' to 'e2e-custom' by @nathan-weinberg in #306
- fix: change "group" to "tag" for mmlu_branch task config by @alimaredia in #305
- fix: remove stop token from mixtral by @cdoern in #310
- ci: update small E2E job to align with CLI and Training by @nathan-weinberg in #317
- ci: update medium job to run as PR check by @nathan-weinberg in #318
- build(deps): bump rojopolis/spellcheck-github-actions from 0.43.0 to 0.43.1 by @dependabot in #314
- fix: medium E2E CI job was missing HF_TOKEN by @nathan-weinberg in #319
- build(deps): bump actions/cache from 4.1.1 to 4.1.2 by @dependabot in #320
- ci: use org variable for AWS EC2 AMI in E2E CI jobs by @nathan-weinberg in #322
- ci: convert med E2E CI job to L4 GPU by @nathan-weinberg in #325
- build(deps): bump rojopolis/spellcheck-github-actions from 0.43.1 to 0.44.0 by @dependabot in #326
- build(deps): bump actions/setup-python from 5.2.0 to 5.3.0 by @dependabot in #323
- build(deps): bump pypa/gh-action-pypi-publish from 1.10.3 to 1.11.0 by @dependabot in #327
New Contributors
- @alimaredia made their first contribution in #305
Full Changelog: v0.4.2...v0.5.0a1
v0.3.2
What's Changed
- map mistral model name to mixtral by @cdoern in #315
- Without these changes, the mistral models will use merlinite templates which will result in unusable output.
Full Changelog: v0.3.1...v0.3.2
v0.4.2
What's Changed
Full Changelog: v0.4.1...v0.4.2