[Evals API][8/n] AnswerParsingScoringFn for MMLU #352

yanxi0830 · 2024-10-31T23:51:44Z

--continuation of #333

TL;DR

Introduce a registerable AnswerParsingScoringFn with AnswerParsingScoringContext for registering scoring functions with context
Remove parameters field (context alone is sufficient for things related to scoring context)
AnswerParsingScoringFn is able to run all benchmarks with multiple choice tasks (see apps PR).

Test

PROVIDER_ID=test-meta PROVIDER_CONFIG=llama_stack/providers/tests/scoring/provider_config_example.yaml pytest -s llama_stack/providers/tests/scoring/test_scoring.py --tb=short --disable-warnings

App

See App: [Evals API] demo MMLU+SimpleQA eval llama-stack-apps#105

yanxi0830 · 2024-11-01T01:20:17Z

llama_stack/apis/scoring_functions/scoring_functions.py

@@ -40,14 +73,13 @@ class ScoringFnDef(BaseModel):
        default_factory=dict,
        description="Any additional metadata for this definition",
    )
-    parameters: List[Parameter] = Field(


Removing parameters field as we can just use context for defining parameters associated with the scoring function.

yanxi0830 · 2024-11-01T01:21:52Z

llama_stack/providers/impls/meta_reference/scoring/scoring.py


 FIXED_FNS = [EqualityScoringFn, SubsetOfScoringFn]

-LLM_JUDGE_FNS = [LlmAsJudgeScoringFn]
+# Scoring functions with context that can be registered
+REGISTERABLE_SCORING_FNS = {


Each Registerable ScoringFn is mapped to a ScoringContextType, So that we are able to register a scoring function with custom judge_prompt/answer_extraction regex.

yanxi0830 · 2024-11-07T19:16:45Z

rebasing in #392

yanxi0830 added 3 commits October 31, 2024 15:26

scoring fn api update

f1a2548

registration answer parsing

779e66f

test scoring

f94681b

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 31, 2024

yanxi0830 changed the title ~~[WIP][Evals API][]~~ [WIP][Evals API][8/n] AnswerParsingScoringFn for MMLU Oct 31, 2024

yanxi0830 mentioned this pull request Nov 1, 2024

[Evals API] demo MMLU+SimpleQA eval meta-llama/llama-stack-apps#105

Closed

5 tasks

yanxi0830 changed the title ~~[WIP][Evals API][8/n] AnswerParsingScoringFn for MMLU~~ [Evals API][8/n] AnswerParsingScoringFn for MMLU Nov 1, 2024

yanxi0830 marked this pull request as ready for review November 1, 2024 01:14

yanxi0830 requested review from ashwinb, hardikjshah, dltn and raghotham as code owners November 1, 2024 01:14

yanxi0830 commented Nov 1, 2024

View reviewed changes

yanxi0830 mentioned this pull request Nov 1, 2024

[Evals API][9/n] SimpleQA evals #353

Closed

yanxi0830 closed this Nov 7, 2024

yanxi0830 deleted the eval_benchmarks branch November 11, 2024 16:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Evals API][8/n] AnswerParsingScoringFn for MMLU #352

[Evals API][8/n] AnswerParsingScoringFn for MMLU #352

yanxi0830 commented Oct 31, 2024 •

edited

Loading

yanxi0830 Nov 1, 2024

yanxi0830 Nov 1, 2024 •

edited

Loading

yanxi0830 commented Nov 7, 2024

[Evals API][8/n] AnswerParsingScoringFn for MMLU #352

[Evals API][8/n] AnswerParsingScoringFn for MMLU #352

Conversation

yanxi0830 commented Oct 31, 2024 • edited Loading

TL;DR

Test

App

yanxi0830 Nov 1, 2024

Choose a reason for hiding this comment

yanxi0830 Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

yanxi0830 commented Nov 7, 2024

yanxi0830 commented Oct 31, 2024 •

edited

Loading

yanxi0830 Nov 1, 2024 •

edited

Loading