Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Evals API][8/n] AnswerParsingScoringFn for MMLU #352

Closed
wants to merge 3 commits into from

Conversation

yanxi0830
Copy link
Contributor

@yanxi0830 yanxi0830 commented Oct 31, 2024

--continuation of #333

TL;DR

  1. Introduce a registerable AnswerParsingScoringFn with AnswerParsingScoringContext for registering scoring functions with context
  2. Remove parameters field (context alone is sufficient for things related to scoring context)
  3. AnswerParsingScoringFn is able to run all benchmarks with multiple choice tasks (see apps PR).

Test

PROVIDER_ID=test-meta PROVIDER_CONFIG=llama_stack/providers/tests/scoring/provider_config_example.yaml pytest -s llama_stack/providers/tests/scoring/test_scoring.py --tb=short --disable-warnings

App

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 31, 2024
@yanxi0830 yanxi0830 changed the title [WIP][Evals API][] [WIP][Evals API][8/n] AnswerParsingScoringFn for MMLU Oct 31, 2024
@yanxi0830 yanxi0830 changed the title [WIP][Evals API][8/n] AnswerParsingScoringFn for MMLU [Evals API][8/n] AnswerParsingScoringFn for MMLU Nov 1, 2024
@yanxi0830 yanxi0830 marked this pull request as ready for review November 1, 2024 01:14
@@ -40,14 +73,13 @@ class ScoringFnDef(BaseModel):
default_factory=dict,
description="Any additional metadata for this definition",
)
parameters: List[Parameter] = Field(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing parameters field as we can just use context for defining parameters associated with the scoring function.


FIXED_FNS = [EqualityScoringFn, SubsetOfScoringFn]

LLM_JUDGE_FNS = [LlmAsJudgeScoringFn]
# Scoring functions with context that can be registered
REGISTERABLE_SCORING_FNS = {
Copy link
Contributor Author

@yanxi0830 yanxi0830 Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each Registerable ScoringFn is mapped to a ScoringContextType, So that we are able to register a scoring function with custom judge_prompt/answer_extraction regex.

@yanxi0830
Copy link
Contributor Author

rebasing in #392

@yanxi0830 yanxi0830 deleted the eval_benchmarks branch November 11, 2024 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants