Skip to content

Latest commit

 

History

History
35 lines (28 loc) · 1.55 KB

Evaluation Working Group Charter.md

File metadata and controls

35 lines (28 loc) · 1.55 KB

Evaluation Working Group Charter

Mission

The OPEA Evaluation Working Group is chartered to identify standardized methodologies and frameworks for evaluating the RAG pipeline, to aid in the benchmarking of the individual components and the end-to-end solution.

The Evaluation will comprise of both Quantitative and Qualitative metrics in the domains of Performance, Safety, Trustworthiness and Scalability.

Problem Statement

  • GenAI Evaluation is a conundrum
  • Most evaluations focused on LLM Model Performance, not on end-to-end of applications deploying LLMs
  • LLMs performance benchmarks plateaued
  • Lack of standardization
  • Multiple leaderboards

Scope and Priority

  • Methodology and Eval Frameworks
  • Performance – Focus on metrics/KPIs for each component and End to end
  • Trustworthiness - Ability to guarantee quality, security, robustness & relevance to Government or other policies
  • Scalability / Enterprise Readiness - Ability to be used in production in enterprise environments

Goals and Objectives

  • Establish standardized methodologies and metrics and frameworks for evaluating RAG components
  • Evaluate Performance, Safety, Trustworthiness, and Scalability
  • Identify Holistic (end to end) metrics and across individual components

Success Indicators and Metrics

  • Identify a Eval framework for defining the metrics
  • Define at least 3 KPIs -- QuantitativePerformance (Throughput/Latency and Accuracy) -- Trustworthiness -- Scalability

Risks

  • Lack of standardization on Evaluation frameworks, KPIs and Benchmarking
  • Evangelization