Moving metrics to metrics and adding pass/fail LLM tests #186

SinclairHudson · 2024-07-04T14:24:34Z

A PR outlining the interface I have in mind for LLM tests. Right now, only the base classes exists, and there's only one test, but it's mainly to show the code structure. More tests to come once the general structure is agreed upon! A lot of this PR is going through and making the code changes to metrics, that silos them into metrics. I have the following working definitions:
A metric is a function that takes an LLM prediction, prompt (optional), and gold-standard output (optional) and computes a scalar value. These values can be aggregated meaningfully over a dataset of predictions to shed light on properties of a model.

A test is a function that takes an LLM prediction, prompt (optional) and gold-standard output (optional) and computes a boolean, pass/fail output that denotes if the model output meets certain criteria or output. These are less about understanding an LLM's properties and more about assuring it can meet specific requirements, with specific prompts.

There can be overlapping underlying functionality between metrics and tests.

As of right now, there's no way for the user to specify their test suite, but that will come in the form of a spreadsheet interface (next PR). Advanced users can use the library in their own code.

benjaminye

I think tests/qa/test_qa_metrics.py and tests/qa/test_qa_tests.py need to be swapped.

Otherwise, LGTM! Good stuff!

tests/qa/test_qa_metrics.py

SinclairHudson added 2 commits July 1, 2024 00:01

renamed all internals to LLM metrics, WIP sketch of Test class

63a7b30

adding unit test for tests

31924fb

benjaminye suggested changes Jul 4, 2024

View reviewed changes

tests/qa/test_qa_metrics.py Show resolved Hide resolved

switching test_qa_metrics and test_qa_tests

d5b7843

benjaminye approved these changes Jul 4, 2024

View reviewed changes

benjaminye merged commit e440a9d into georgian-io:main Jul 4, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moving metrics to metrics and adding pass/fail LLM tests #186

Moving metrics to metrics and adding pass/fail LLM tests #186

SinclairHudson commented Jul 4, 2024

benjaminye left a comment

Moving metrics to metrics and adding pass/fail LLM tests #186

Moving metrics to metrics and adding pass/fail LLM tests #186

Conversation

SinclairHudson commented Jul 4, 2024

benjaminye left a comment

Choose a reason for hiding this comment