Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Add BaseRefinery and OverlapRefinery support #77

Merged
merged 23 commits into from
Dec 4, 2024

Conversation

bhavnicksm
Copy link
Collaborator

This pull request introduces several improvements and new features across multiple files in the project. The most important changes include the addition of a new GitHub Actions workflow for automated Python testing, updates to the project dependencies, and enhancements to the chunking and embedding functionalities.

New Features:

  • .github/workflows/python-test-push.yml: Added a new GitHub Actions workflow for automated Python testing on push events. This workflow installs dependencies, sets up Python, and runs tests using pytest.

Dependency Updates:

  • pyproject.toml: Updated development dependencies, replaced black, isort, flake8, mypy, and pylint with ruff, and added pytest.ini_options and ruff configuration sections.

Code Enhancements:

Documentation:

  • README.md: Added a new section for citations and a special thanks note.

bhavnicksm and others added 23 commits November 25, 2024 16:11
I have added a small documentation on how to setup local env for
testing.

**Changes in pyproject.toml**
`pytest` was not picking up local changes. It was giving
`ModuleNotFoundError` error for `chonkie`. Adding `pythonpath` fixed
that.

**Automated Testing**
I have also added a github action to run the tests automatically on each
`git push`. I have used `uv` because of superfast dependency
installation.

I followed this guide for the GitHub Actions setup:
https://docs.astral.sh/uv/guides/integration/github/
- Added ruff checks for import sorting and docstring formatting
- Fixed docstrings across chunker and embeddings modules to comply with standards
- Updated pyproject.toml to include ruff configuration
- Introduced a new Context class for managing contextual information during chunk refinement.
- Added a new Refinery module with BaseRefinery and OverlapRefinery classes to enhance chunk processing.
- Updated the __init__.py files to include new classes in the package exports.
- Modified the Chunk class to incorporate context attributes.
- Enhanced the pyproject.toml to include the new refinery package.
- Added tests for OverlapRefinery to ensure functionality and correctness.
…updates

- Added error handling for missing embedding model, prompting installation of the `semantic` extra.
- Updated similarity threshold assignment to use the instance variable consistently.
- Introduced a new test for SDPMChunker to validate functionality with percentile-based similarity, ensuring proper chunking behavior and attributes.
@bhavnicksm bhavnicksm merged commit 71a9d5d into development Dec 4, 2024
1 check passed
@bhavnicksm bhavnicksm deleted the refinery branch December 24, 2024 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants