-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add evaluation client #12
base: main
Are you sure you want to change the base?
Add evaluation client #12
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EvalClient app runs when isolated from the other samples. When the .sln of the whole solution is opened it gives error
Projects that use central package version management should not define the version on the PackageReference items but on the PackageVersion items: Azure.Identity;Microsoft.Extensions.Configuration.UserSecrets;Microsoft.Extensions.AI.OpenAI;Microsoft.Agents.Client;Microsoft.Extensions.Hosting;Microsoft.Identity.Client;CsvHelper;Microsoft.Agents.CopilotStudio.Client;Microsoft.Identity.Client.Extensions.Msal;Azure.AI.OpenAI;Microsoft.Agents.Authentication.
@TechPreacher This is great! We will take some action on this after the holidays. Long term, before the GA, we will handle samples differently. Namely using released package versions instead of being part of the SDK solution and using project references. We aren't quite ready to make that change, so what you are doing is the correct thing for now. |
This is fixed now. Thanks for noticing! |
Thanks @tracyboehrer ! You can always reach me internally at "saschac". |
This PR adds an evaluation client that can evaluate answer correctness vs. a ground truth and document retrieval for RAG based Copilot Studio agents.
Sample Evaluation
Evaluation Dataset.csv
Evaluation Dataset Results.csv
Explanation
This RAG agent was created by adding a SharePoint knowledge source to the agent. The evaluator client asks the agent to answer the question provided in the "Test Utterance" field and compares the answer with the "Expected Response" answer.
Based on how well the agent's answer matches the expected response semantically, a score is given between 0 and 100, 10 being the worst and 100 the perfect answer. The value is stored in the "Answer Score" field. The "Sources" field provided 2 URLs the agent should have used to answer the question.
The evaluator client checks if the agent used the provided URLs to answer the question by returning them as reference links and returns it as x/y where x are the number of links provided by the agent
over y which represents the number of links expected.