Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][wip] make room for batch eval #1002

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

[RFC][wip] make room for batch eval #1002

wants to merge 2 commits into from

Conversation

jonathanlastmileai
Copy link
Contributor

@jonathanlastmileai jonathanlastmileai commented Jan 23, 2024

[RFC][wip] make room for batch eval

This moves the existing eval library to "test_suite_eval" and starts the equivalent
for batch runs. Also makes the interface a little clearer.

Essentially, the differences are:

  • each metric runs on a list of inputs, not just one
  • each input can be paired with a reference. This is possible in the "test suite"
    setup, but it is clunkier.

Stack created with Sapling. Best reviewed with ReviewStack.

- Add decorators for metric creation and reimplement some existing ones
- Add a unit test


Test plan:

Existing and new unit tests.
@jonathanlastmileai jonathanlastmileai changed the title [wip] rename -> test_suite, batch eval [RFC] rename -> test_suite, batch eval Jan 23, 2024
This moves the existing eval library to "test_suite_eval" and starts the equivalent
for batch runs. Also makes the interface a little clearer.

Essentially, the differences are:
- each metric runs on a _list_ of inputs, not just one
- each input can be paired with a reference. This is possible in the "test suite"
setup, but it is clunkier.
@jonathanlastmileai jonathanlastmileai changed the title [RFC] rename -> test_suite, batch eval [RFC][wip] make room for batch eval Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant