Release v0.2.3 · stanford-crfm/helm

Added BigCode StarCoder (#1506)
Added OPT 1.3B and 6.7B (#1468)
Added OpenAI gpt-3.5-turbo-0613 (#1667), gpt-3.5-turbo-16k-0613, gpt-4-0613, gpt-4-32k-0613 (#1468), gpt-4-32k-0314, gpt-4-32k-0314 (#1457)
Added OpenAI text-embedding-ada-002 (#1711)
Added Writer Palmyra (#1669, #1491)
Added Anthropic Claude (#1484)
Added Databricks Koala on Together (#1701)
Added Stability AI StableLM and Together RedPajama on Together

Added legal summarization scenarios (#1454)
Fixed corner cases in window service truncation (#1449)
Pinned file order for ICE, APPS (code) and ICE scenarios (#1352)
Fixed random seed for entity matching scenario (#1475)
Added Spider text-to-SQL (#1385)
Added Vicuna scenario (#1641), Koala scenario (#1642), open_assistant scenario (#1622), and Anthropic-HH-RLHF scenario (#1643) for instruction-following
Added verifiability judgement scenario (#1518)

Added script for estimating the cost of a run suite (#1480)
Added support for human critique evaluation using Surge AI (#1330), Scale AI (#1609), and Amazon Mechanical Turk (#1539)
Added support for LLM critique evaluation (#1627)
Decreased running time of helm-summarize (#1716)
Added SlurmRunner for distributing helm-run jobs over Slurm (#1550)
Migrated to the setuptools.build_meta backend (#1535)
Stopped non-retriable errors (e.g. content filter errors) from being retried (#1533)
Added logging for stack trace and exception message when retries occur (#1555)
Added file locking for ensure_file_downloaded() (#1692)

Evaluations