Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evals API MVP #235

Closed
wants to merge 28 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
31c046d
evals new rebase
yanxi0830 Oct 10, 2024
c8de439
clean
yanxi0830 Oct 10, 2024
99ed142
add dataset datatypes
yanxi0830 Oct 11, 2024
9816c9a
wip add datatypes
yanxi0830 Oct 11, 2024
ad18dc9
add data structure to tasks
yanxi0830 Oct 11, 2024
fb565df
eleuther eval fix
yanxi0830 Oct 11, 2024
a25aff2
generator + scorer Api for MMLU
yanxi0830 Oct 14, 2024
8890de7
cleanup original BaseTask
yanxi0830 Oct 14, 2024
78cb88c
RunEvalTask / InferenceGenerator
yanxi0830 Oct 14, 2024
18fe966
registry refactor
yanxi0830 Oct 14, 2024
f046899
datasets api
yanxi0830 Oct 14, 2024
a9210cd
datasets api crud
yanxi0830 Oct 14, 2024
9c501d0
cleanup hardcoded dataset registry
yanxi0830 Oct 14, 2024
c50686b
scorer registry
yanxi0830 Oct 14, 2024
95fd53d
registry refactor
yanxi0830 Oct 14, 2024
a22c31b
processor registry
yanxi0830 Oct 14, 2024
fcb8dea
scorer only api
yanxi0830 Oct 15, 2024
c8f6849
full accuracy
yanxi0830 Oct 15, 2024
7b58950
braintrust scorer
yanxi0830 Oct 15, 2024
3c29108
input query optional input for braintrust scorer
yanxi0830 Oct 15, 2024
ec6c63b
dataset accept file uploads
yanxi0830 Oct 15, 2024
9cc0a54
rag correctness scorer w/ custom dataset
yanxi0830 Oct 15, 2024
d2b6215
openapi gen
yanxi0830 Oct 15, 2024
cccd5be
move eval_task_config to client
yanxi0830 Oct 15, 2024
be4f395
full evals / full scoring flow
yanxi0830 Oct 15, 2024
2c23a66
Merge branch 'main' into evals_new
yanxi0830 Oct 15, 2024
0c4ed66
regen openapi
yanxi0830 Oct 15, 2024
fa68809
llm judge llamastack scorer
yanxi0830 Oct 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/openapi_generator/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@

from llama_models.llama3.api.datatypes import * # noqa: F403
from llama_stack.apis.agents import * # noqa: F403
from llama_stack.apis.dataset import * # noqa: F403
from llama_stack.apis.datasets import * # noqa: F403
from llama_stack.apis.evals import * # noqa: F403
from llama_stack.apis.inference import * # noqa: F403
from llama_stack.apis.batch_inference import * # noqa: F403
Expand Down Expand Up @@ -61,7 +61,7 @@ class LlamaStack(
Telemetry,
PostTraining,
Memory,
Evaluations,
Evals,
Models,
Shields,
Inspect,
Expand Down
Loading