You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FMBench can now evaluate models using a panel of LLM judges and give accuracy scores as to which candidate model is the most accurate. This issue is to do as follows:
Calculate a hallucination metric (that measures the amount of times a given response was actually a hallucination and incorrect)
Calculate the correctly "incorrect" answers, a.k.a number of times a candidate model said "i don't know" to a question rather than hallucinating and giving a response.
The text was updated successfully, but these errors were encountered:
FMBench can now evaluate models using a panel of LLM judges and give accuracy scores as to which candidate model is the most accurate. This issue is to do as follows:
The text was updated successfully, but these errors were encountered: