10 Jan 15:02

bbrowning

2763286

v0.6.3 Latest

Latest

SDG v0.6.3

Fixes

The max version constraint of PyTorch in our requirements file was raised so that we don't prevent SDG users from using it PyTorch 2.5.

All Changes

chore!: Update PyTorch to 2.5 (backport #465) by @mergify in #469
Update CHANGELOG.md for v0.6.3 (backport #473) by @mergify in #474

Full Changelog: v0.6.2...v0.6.3

Contributors

mergify

Assets 6

10 Dec 17:44

bbrowning

v0.6.2

9bcde30

v0.6.2

SDG v0.6.2

Fixes

Fixed a bug in our version specification of docling and docling_parse dependencies that were causing new installs of InstructLab to pull in incompatible versions of these. We also fixed a similar bug in the mypy dependency, but that one only impacts developers of SDG as opposed to users of InstructLab.

All Changes

Move AWS_REGION from using secret to var (backport #422) by @mergify in #438
fix: Restrict docling library versions to resolve dependency issues + update mypy linting packages (backport #434) by @mergify in #437
Update CHANGELOG.md for release v0.6.2 (backport #440) by @mergify in #441

Full Changelog: v0.6.1...v0.6.2

Contributors

mergify

Assets 6

27 Nov 23:48

bbrowning

v0.6.1

c220b5f

v0.6.1

SDG v0.6.1

What's Changed

Add [End] to parser cleanup tags (backport #400) by @mergify in #403
Ensure knowledge docs are cloned into unique dirs (backport #416) by @mergify in #417

Full Changelog: v0.6.0...v0.6.1

Contributors

mergify

Assets 6

15 Nov 19:52

khaledsulayman

v0.6.0

4e90549

v0.6.0

SDG v0.6.0

What's Changed

fix: formatting error by @RobotSail in #378
Prefer tesserocr over easyocr, if available by @bbrowning in #369
ci: add large-size E2E CI job by @nathan-weinberg in #380
Add Release Strategy Document by @khaledsulayman in #381
Docling models path by @aakankshaduggal in #362
Check for tokenizer in downloaded models directory by @khaledsulayman in #364
fix: upsample the phase10 knowledge dataset by @RobotSail in #377
build(deps): bump DavidAnson/markdownlint-cli2-action from 17.0.0 to 18.0.0 by @dependabot in #386
Delete .gitattributes by @khaledsulayman in #393

New Contributors

@RobotSail made their first contribution in #378

Full Changelog: v0.5.0...v0.6.0

Contributors

bbrowning, aakankshaduggal, and 4 other contributors

Assets 6

13 Nov 17:15

nathan-weinberg

v0.3.3

fbfe7d4

v0.3.3

What's Changed

Prepare release-v0.3 branch for backports by @bbrowning in #371
Run the simple pipeline on small runners by @bbrowning in #372
Data mix fix (backport #366) by @mergify in #368

Full Changelog: v0.3.2...v0.3.3

Contributors

bbrowning and mergify

Assets 6

12 Nov 22:32

khaledsulayman

v0.5.0

b6f07a8

v0.5.0

What's Changed

build(deps): bump actions/cache from 4.1.0 to 4.1.1 by @dependabot in #300
build(deps): bump rojopolis/spellcheck-github-actions from 0.42.0 to 0.43.0 by @dependabot in #299
build(deps): bump actions/checkout from 4.2.0 to 4.2.1 by @dependabot in #298
chore: rename 'basic-workflow-tests' to 'e2e-custom' by @nathan-weinberg in #306
fix: change "group" to "tag" for mmlu_branch task config by @alimaredia in #305
fix: remove stop token from mixtral by @cdoern in #310
ci: update small E2E job to align with CLI and Training by @nathan-weinberg in #317
ci: update medium job to run as PR check by @nathan-weinberg in #318
build(deps): bump rojopolis/spellcheck-github-actions from 0.43.0 to 0.43.1 by @dependabot in #314
fix: medium E2E CI job was missing HF_TOKEN by @nathan-weinberg in #319
build(deps): bump actions/cache from 4.1.1 to 4.1.2 by @dependabot in #320
ci: use org variable for AWS EC2 AMI in E2E CI jobs by @nathan-weinberg in #322
ci: convert med E2E CI job to L4 GPU by @nathan-weinberg in #325
build(deps): bump rojopolis/spellcheck-github-actions from 0.43.1 to 0.44.0 by @dependabot in #326
build(deps): bump actions/setup-python from 5.2.0 to 5.3.0 by @dependabot in #323
build(deps): bump pypa/gh-action-pypi-publish from 1.10.3 to 1.11.0 by @dependabot in #327
build(deps): bump actions/checkout from 4.2.1 to 4.2.2 by @dependabot in #321
build(deps): bump machulav/ec2-github-runner from 2.3.6 to 2.3.7 by @dependabot in #328
build(deps): bump hynek/build-and-inspect-python-package from 2.9.0 to 2.10.0 by @dependabot in #329
build(deps): bump rhysd/actionlint from 1.7.3 to 1.7.4 in /.github/workflows by @dependabot in #332
build(deps): bump pypa/gh-action-pypi-publish from 1.11.0 to 1.12.0 by @dependabot in #337
build(deps): bump rojopolis/spellcheck-github-actions from 0.44.0 to 0.45.0 by @dependabot in #338
build(deps): bump pypa/gh-action-pypi-publish from 1.12.0 to 1.12.2 by @dependabot in #342
Integrate Context-Aware Chunking and PDF Support by @khaledsulayman in #284
feat: parametrize system prompt by @jaideepr97 in #339
feat: support converting messages datasets into multiple pre-training formats by @jaideepr97 in #341
Move to Docling v2 APIs by @bbrowning in #347
feat: expose max_num_tokens as configurable by @cdoern in #340
Remove unnecessary requirement for qna.yaml in ContextAwareChunker by @khaledsulayman in #351
Upgrade docling, expand chunking testing by @bbrowning in #349
Don't attempt batching with InstructLab's llama-cpp-python by @bbrowning in #358
Consolidate test sample documents into one subdir by @bbrowning in #356
Move a spurious print to a debug log message by @bbrowning in #359
Only use CPU for the docling OCR models by @bbrowning in #361
Data mix fix by @aakankshaduggal in #366

New Contributors

@alimaredia made their first contribution in #305

Full Changelog: v0.4.2...v0.5.0

Contributors

bbrowning, alimaredia, and 6 other contributors

Assets 6

08 Nov 21:22

khaledsulayman

v0.5.0a2

e0698d6

v0.5.0a2 Pre-release

Pre-release

What's Changed

build(deps): bump actions/checkout from 4.2.1 to 4.2.2 by @dependabot in #321
build(deps): bump machulav/ec2-github-runner from 2.3.6 to 2.3.7 by @dependabot in #328
build(deps): bump hynek/build-and-inspect-python-package from 2.9.0 to 2.10.0 by @dependabot in #329
build(deps): bump rhysd/actionlint from 1.7.3 to 1.7.4 in /.github/workflows by @dependabot in #332
build(deps): bump pypa/gh-action-pypi-publish from 1.11.0 to 1.12.0 by @dependabot in #337
build(deps): bump rojopolis/spellcheck-github-actions from 0.44.0 to 0.45.0 by @dependabot in #338
build(deps): bump pypa/gh-action-pypi-publish from 1.12.0 to 1.12.2 by @dependabot in #342
Integrate Context-Aware Chunking and PDF Support by @khaledsulayman in #284
feat: parametrize system prompt by @jaideepr97 in #339
feat: support converting messages datasets into multiple pre-training formats by @jaideepr97 in #341
Move to Docling v2 APIs by @bbrowning in #347
feat: expose max_num_tokens as configurable by @cdoern in #340
Remove unnecessary requirement for qna.yaml in ContextAwareChunker by @khaledsulayman in #351
Upgrade docling, expand chunking testing by @bbrowning in #349

Full Changelog: v0.5.0a1...v0.5.0a2

Contributors

bbrowning, jaideepr97, and 3 other contributors

Assets 6

01 Nov 17:13

nathan-weinberg

v0.5.0a1

5abc57f

v0.5.0a1 Pre-release

Pre-release

v0.5.0a1

What's Changed

build(deps): bump actions/cache from 4.1.0 to 4.1.1 by @dependabot in #300
build(deps): bump rojopolis/spellcheck-github-actions from 0.42.0 to 0.43.0 by @dependabot in #299
build(deps): bump actions/checkout from 4.2.0 to 4.2.1 by @dependabot in #298
chore: rename 'basic-workflow-tests' to 'e2e-custom' by @nathan-weinberg in #306
fix: change "group" to "tag" for mmlu_branch task config by @alimaredia in #305
fix: remove stop token from mixtral by @cdoern in #310
ci: update small E2E job to align with CLI and Training by @nathan-weinberg in #317
ci: update medium job to run as PR check by @nathan-weinberg in #318
build(deps): bump rojopolis/spellcheck-github-actions from 0.43.0 to 0.43.1 by @dependabot in #314
fix: medium E2E CI job was missing HF_TOKEN by @nathan-weinberg in #319
build(deps): bump actions/cache from 4.1.1 to 4.1.2 by @dependabot in #320
ci: use org variable for AWS EC2 AMI in E2E CI jobs by @nathan-weinberg in #322
ci: convert med E2E CI job to L4 GPU by @nathan-weinberg in #325
build(deps): bump rojopolis/spellcheck-github-actions from 0.43.1 to 0.44.0 by @dependabot in #326
build(deps): bump actions/setup-python from 5.2.0 to 5.3.0 by @dependabot in #323
build(deps): bump pypa/gh-action-pypi-publish from 1.10.3 to 1.11.0 by @dependabot in #327

New Contributors

@alimaredia made their first contribution in #305

Full Changelog: v0.4.2...v0.5.0a1

Contributors

alimaredia, cdoern, and 2 other contributors

Assets 6

18 Oct 21:14

aakankshaduggal

v0.3.2

481e3f6

v0.3.2

What's Changed

map mistral model name to mixtral by @cdoern in #315
Without these changes, the mistral models will use merlinite templates which will result in unusable output.

Full Changelog: v0.3.1...v0.3.2

Contributors

cdoern

Assets 6

15 Oct 14:47

khaledsulayman

v0.4.2

36e7bbf

v0.4.2

What's Changed

fix: remove stop token from mixtral (backport #310) by @mergify in #311

Full Changelog: v0.4.1...v0.4.2

Contributors

mergify

Assets 6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDG v0.6.3

Fixes

All Changes

Contributors

SDG v0.6.2

Fixes

All Changes

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

v0.5.0

What's Changed

New Contributors

Contributors

What's Changed

Contributors

v0.5.0a1

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

Releases: instructlab/sdg

v0.6.3

SDG v0.6.3

Fixes

All Changes

Contributors

v0.6.2

SDG v0.6.2

Fixes

All Changes

Contributors

v0.6.1

What's Changed

Contributors

v0.6.0

What's Changed

New Contributors

Contributors

v0.3.3

What's Changed

Contributors

v0.5.0

v0.5.0

What's Changed

New Contributors

Contributors

v0.5.0a2

What's Changed

Contributors

v0.5.0a1

v0.5.0a1

What's Changed

New Contributors

Contributors

v0.3.2

What's Changed

Contributors

v0.4.2

What's Changed

Contributors