v0.5.0
v0.5.0
What's Changed
- build(deps): bump actions/cache from 4.1.0 to 4.1.1 by @dependabot in #300
- build(deps): bump rojopolis/spellcheck-github-actions from 0.42.0 to 0.43.0 by @dependabot in #299
- build(deps): bump actions/checkout from 4.2.0 to 4.2.1 by @dependabot in #298
- chore: rename 'basic-workflow-tests' to 'e2e-custom' by @nathan-weinberg in #306
- fix: change "group" to "tag" for mmlu_branch task config by @alimaredia in #305
- fix: remove stop token from mixtral by @cdoern in #310
- ci: update small E2E job to align with CLI and Training by @nathan-weinberg in #317
- ci: update medium job to run as PR check by @nathan-weinberg in #318
- build(deps): bump rojopolis/spellcheck-github-actions from 0.43.0 to 0.43.1 by @dependabot in #314
- fix: medium E2E CI job was missing HF_TOKEN by @nathan-weinberg in #319
- build(deps): bump actions/cache from 4.1.1 to 4.1.2 by @dependabot in #320
- ci: use org variable for AWS EC2 AMI in E2E CI jobs by @nathan-weinberg in #322
- ci: convert med E2E CI job to L4 GPU by @nathan-weinberg in #325
- build(deps): bump rojopolis/spellcheck-github-actions from 0.43.1 to 0.44.0 by @dependabot in #326
- build(deps): bump actions/setup-python from 5.2.0 to 5.3.0 by @dependabot in #323
- build(deps): bump pypa/gh-action-pypi-publish from 1.10.3 to 1.11.0 by @dependabot in #327
- build(deps): bump actions/checkout from 4.2.1 to 4.2.2 by @dependabot in #321
- build(deps): bump machulav/ec2-github-runner from 2.3.6 to 2.3.7 by @dependabot in #328
- build(deps): bump hynek/build-and-inspect-python-package from 2.9.0 to 2.10.0 by @dependabot in #329
- build(deps): bump rhysd/actionlint from 1.7.3 to 1.7.4 in /.github/workflows by @dependabot in #332
- build(deps): bump pypa/gh-action-pypi-publish from 1.11.0 to 1.12.0 by @dependabot in #337
- build(deps): bump rojopolis/spellcheck-github-actions from 0.44.0 to 0.45.0 by @dependabot in #338
- build(deps): bump pypa/gh-action-pypi-publish from 1.12.0 to 1.12.2 by @dependabot in #342
- Integrate Context-Aware Chunking and PDF Support by @khaledsulayman in #284
- feat: parametrize system prompt by @jaideepr97 in #339
- feat: support converting messages datasets into multiple pre-training formats by @jaideepr97 in #341
- Move to Docling v2 APIs by @bbrowning in #347
- feat: expose max_num_tokens as configurable by @cdoern in #340
- Remove unnecessary requirement for qna.yaml in ContextAwareChunker by @khaledsulayman in #351
- Upgrade docling, expand chunking testing by @bbrowning in #349
- Don't attempt batching with InstructLab's llama-cpp-python by @bbrowning in #358
- Consolidate test sample documents into one subdir by @bbrowning in #356
- Move a spurious print to a debug log message by @bbrowning in #359
- Only use CPU for the docling OCR models by @bbrowning in #361
- Data mix fix by @aakankshaduggal in #366
New Contributors
- @alimaredia made their first contribution in #305
Full Changelog: v0.4.2...v0.5.0