Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docling models path #362

Merged
merged 13 commits into from
Nov 14, 2024
Merged

Conversation

aakankshaduggal
Copy link
Member

@mergify mergify bot added the ci-failure label Nov 11, 2024
src/instructlab/sdg/generate_data.py Outdated Show resolved Hide resolved
src/instructlab/sdg/generate_data.py Outdated Show resolved Hide resolved
src/instructlab/sdg/utils/chunkers.py Outdated Show resolved Hide resolved
Copy link
Contributor

mergify bot commented Nov 12, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @aakankshaduggal please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 12, 2024
Signed-off-by: Aakanksha Duggal <[email protected]>
@mergify mergify bot added the ci-failure label Nov 13, 2024
Signed-off-by: Aakanksha Duggal <[email protected]>
@mergify mergify bot added ci-failure and removed ci-failure labels Nov 13, 2024
src/instructlab/sdg/generate_data.py Outdated Show resolved Hide resolved
src/instructlab/sdg/utils/chunkers.py Outdated Show resolved Hide resolved
aakankshaduggal and others added 2 commits November 13, 2024 08:37
Co-authored-by: Jaideep Rao <[email protected]>
Signed-off-by: Aakanksha Duggal <[email protected]>
Co-authored-by: Jaideep Rao <[email protected]>
Signed-off-by: Aakanksha Duggal <[email protected]>
@mergify mergify bot added ci-failure and removed ci-failure labels Nov 13, 2024
@aakankshaduggal aakankshaduggal force-pushed the docling-model-path branch 2 times, most recently from 1fd4284 to dcbf7f3 Compare November 13, 2024 16:58
@mergify mergify bot added ci-failure and removed ci-failure labels Nov 13, 2024
Copy link
Member

@jaideepr97 jaideepr97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@mergify mergify bot added the one-approval label Nov 13, 2024
Copy link
Contributor

@bbrowning bbrowning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of notes where I don't think the path is actually getting looked up and used like we want. And, as a more general note, I wonder if it would be easier to just lookup the docling models using the xdg_data_dirs logic closer to where we use docling, inside ContextAwareChunker as that would avoid us having to wire this model path all the way through the calls here to get it from generate_data down to where it's used.

src/instructlab/sdg/generate_data.py Show resolved Hide resolved
src/instructlab/sdg/generate_data.py Outdated Show resolved Hide resolved
src/instructlab/sdg/generate_data.py Outdated Show resolved Hide resolved
Copy link
Contributor

mergify bot commented Nov 14, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @aakankshaduggal please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 14, 2024
@ktam3 ktam3 linked an issue Nov 14, 2024 that may be closed by this pull request
4 tasks
@mergify mergify bot added ci-failure and removed needs-rebase labels Nov 14, 2024
Signed-off-by: Aakanksha Duggal <[email protected]>
@mergify mergify bot removed the ci-failure label Nov 14, 2024
@mergify mergify bot added the testing Relates to testing label Nov 14, 2024
Copy link
Contributor

@bbrowning bbrowning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation is fixed, and I pushed a couple of unit tests to confirm we're properly parsing the model path out of the config.yaml when its found and that when it's not we're leaving the model path as None.

Thanks for the great collaboration on this!

@mergify mergify bot added ci-failure and removed one-approval labels Nov 14, 2024
These simple unit tests just test the cases where we found a
config.yaml to parse for the docling model path and where we didn't.

Signed-off-by: Ben Browning <[email protected]>
@mergify mergify bot removed the ci-failure label Nov 14, 2024
@bbrowning
Copy link
Contributor

Changed how my unit tests mock to avoid a Python 3.10 vs 3.11 incompatibility with patching certain modules. This should get the tests green on the Python 3.10 CI again.

@aakankshaduggal
Copy link
Member Author

Thanks @bbrowning for adding the quick unit tests 💯

Copy link
Member

@jaideepr97 jaideepr97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
thanks @aakankshaduggal and @bbrowning !
might want to squash your commits before merging

@mergify mergify bot merged commit 9d4ed74 into instructlab:main Nov 14, 2024
22 checks passed
@nathan-weinberg
Copy link
Member

@Mergifyio backport release-v0.5

Copy link
Contributor

mergify bot commented Nov 15, 2024

backport release-v0.5

✅ Backports have been created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Relates to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docling models path
5 participants