add hallucination metric to model evals on FMBench #170

madhurprash · 2024-08-14T16:02:42Z

FMBench can now evaluate models using a panel of LLM judges and give accuracy scores as to which candidate model is the most accurate. This issue is to do as follows:

Calculate a hallucination metric (that measures the amount of times a given response was actually a hallucination and incorrect)
Calculate the correctly "incorrect" answers, a.k.a number of times a candidate model said "i don't know" to a question rather than hallucinating and giving a response.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add hallucination metric to model evals on FMBench #170

add hallucination metric to model evals on FMBench #170

madhurprash commented Aug 14, 2024

add hallucination metric to model evals on FMBench #170

add hallucination metric to model evals on FMBench #170

Comments

madhurprash commented Aug 14, 2024