Releases
v0.2.3
Models
Added BigCode StarCoder (#1506 )
Added OPT 1.3B and 6.7B (#1468 )
Added OpenAI gpt-3.5-turbo-0613 (#1667 ), gpt-3.5-turbo-16k-0613, gpt-4-0613, gpt-4-32k-0613 (#1468 ), gpt-4-32k-0314, gpt-4-32k-0314 (#1457 )
Added OpenAI text-embedding-ada-002 (#1711 )
Added Writer Palmyra (#1669 , #1491 )
Added Anthropic Claude (#1484 )
Added Databricks Koala on Together (#1701 )
Added Stability AI StableLM and Together RedPajama on Together
Scenarios
Added legal summarization scenarios (#1454 )
Fixed corner cases in window service truncation (#1449 )
Pinned file order for ICE, APPS (code) and ICE scenarios (#1352 )
Fixed random seed for entity matching scenario (#1475 )
Added Spider text-to-SQL (#1385 )
Added Vicuna scenario (#1641 ), Koala scenario (#1642 ), open_assistant scenario (#1622 ), and Anthropic-HH-RLHF scenario (#1643 ) for instruction-following
Added verifiability judgement scenario (#1518 )
Metrics
Fixed bug in multi-choice exact match calculation when scores are tied (#1494 )
Framework
Added script for estimating the cost of a run suite (#1480 )
Added support for human critique evaluation using Surge AI (#1330 ), Scale AI (#1609 ), and Amazon Mechanical Turk (#1539 )
Added support for LLM critique evaluation (#1627 )
Decreased running time of helm-summarize (#1716 )
Added SlurmRunner
for distributing helm-run
jobs over Slurm (#1550 )
Migrated to the setuptools.build_meta
backend (#1535 )
Stopped non-retriable errors (e.g. content filter errors) from being retried (#1533 )
Added logging for stack trace and exception message when retries occur (#1555 )
Added file locking for ensure_file_downloaded()
(#1692 )
Evaluations
Added evaluation results for AI21 Jurassic-2 and Writer Palmyra
You can’t perform that action at this time.