Add evaluation client #12

TechPreacher · 2024-12-17T13:09:27Z

This PR adds an evaluation client that can evaluate answer correctness vs. a ground truth and document retrieval for RAG based Copilot Studio agents.

Sample Evaluation

Evaluation Dataset.csv

Name: Asus Zenbook Duo 2024
Test Type: Response Match
Test Utterance: How can I charge the removable keyboard of the Asus Zenbook Duo 2024?
Expected Response: To charge the removable keyboard of the Asus Zenbook Duo 2024 insert the bundled power adapter into the USB-C port on the keyboard or dock it with the laptop to charge via the built-in connector.
Sources: https://tenant.sharepoint.com/sample/Document%20Library%201/2.pdf;https://tenant.sharepoint.com/sample/Document%20Library%201/3.pdf

Evaluation Dataset Results.csv

Name: Asus Zenbook Duo 2024
Test Type: Response Match
est Utterance: How can I charge the removable keyboard of the Asus Zenbook Duo 2024?
Expected Response: To charge the removable keyboard of the Asus Zenbook Duo 2024 insert the bundled power adapter into the USB-C port on the keyboard or dock it with the laptop to charge via the built-in connector.
Sources: https://tenant.sharepoint.com/sample/Document%20Library%201/2.pdf;https://tenant.sharepoint.com/sample/Document%20Library%201/3.pdf
Agent Response: 
To charge the removable keyboard of the Asus Zenbook Duo 2024 you can follow these steps:
1. Connect the AC power cord to the AC/DC adapter.
2. Connect the DC power connector into your keyboard’s power (DC) input port.
3. Plug the AC power adapter into a 100V~240V power source [1].
[1]: https://tenant.sharepoint.com/sample/Document%20Library%201/2.pdf
Answer Score: 65
Sources Score: 1/2

Explanation

This RAG agent was created by adding a SharePoint knowledge source to the agent. The evaluator client asks the agent to answer the question provided in the "Test Utterance" field and compares the answer with the "Expected Response" answer.

Based on how well the agent's answer matches the expected response semantically, a score is given between 0 and 100, 10 being the worst and 100 the perfect answer. The value is stored in the "Answer Score" field. The "Sources" field provided 2 URLs the agent should have used to answer the question.

The evaluator client checks if the agent used the provided URLs to answer the question by returning them as reference links and returns it as x/y where x are the number of links provided by the agent
over y which represents the number of links expected.

svandenhoven

The EvalClient app runs when isolated from the other samples. When the .sln of the whole solution is opened it gives error

Projects that use central package version management should not define the version on the PackageReference items but on the PackageVersion items: Azure.Identity;Microsoft.Extensions.Configuration.UserSecrets;Microsoft.Extensions.AI.OpenAI;Microsoft.Agents.Client;Microsoft.Extensions.Hosting;Microsoft.Identity.Client;CsvHelper;Microsoft.Agents.CopilotStudio.Client;Microsoft.Identity.Client.Extensions.Msal;Azure.AI.OpenAI;Microsoft.Agents.Authentication.

src/samples/EvalClient/EvaluationService.cs

src/samples/EvalClient/Program.cs

src/samples/EvalClient/EvalClient.csproj

tracyboehrer · 2024-12-19T14:06:30Z

@TechPreacher This is great! We will take some action on this after the holidays. Long term, before the GA, we will handle samples differently. Namely using released package versions instead of being part of the SDK solution and using project references. We aren't quite ready to make that change, so what you are doing is the correct thing for now.

TechPreacher · 2024-12-19T15:20:39Z

The EvalClient app runs when isolated from the other samples. When the .sln of the whole solution is opened it gives error

Projects that use central package version management should not define the version on the PackageReference items but on the PackageVersion items: Azure.Identity;Microsoft.Extensions.Configuration.UserSecrets;Microsoft.Extensions.AI.OpenAI;Microsoft.Agents.Client;Microsoft.Extensions.Hosting;Microsoft.Identity.Client;CsvHelper;Microsoft.Agents.CopilotStudio.Client;Microsoft.Identity.Client.Extensions.Msal;Azure.AI.OpenAI;Microsoft.Agents.Authentication.

This is fixed now. Thanks for noticing!

TechPreacher · 2024-12-19T15:22:27Z

@TechPreacher This is great! We will take some action on this after the holidays. Long term, before the GA, we will handle samples differently. Namely using released package versions instead of being part of the SDK solution and using project references. We aren't quite ready to make that change, so what you are doing is the correct thing for now.

Thanks @tracyboehrer ! You can always reach me internally at "saschac".

TechPreacher added 6 commits December 17, 2024 11:39

Add code

5df0ec6

Update README.md

f238f56

Remove output of entire input dataset.

9290253

clean up.

b97f4b9

Fix typo

3c611e4

Merge branch 'main' into feature/saschac-add-evaluation-client

bf11cd1

svandenhoven suggested changes Dec 18, 2024

View reviewed changes

TechPreacher added 2 commits December 19, 2024 15:01

PR feedback

6731a10

Optimize output

204488a

Moved packages to global

32d6014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add evaluation client #12

Add evaluation client #12

TechPreacher commented Dec 17, 2024 •

edited

Loading

svandenhoven left a comment

tracyboehrer commented Dec 19, 2024

TechPreacher commented Dec 19, 2024

TechPreacher commented Dec 19, 2024 •

edited

Loading

Add evaluation client #12

Are you sure you want to change the base?

Add evaluation client #12

Conversation

TechPreacher commented Dec 17, 2024 • edited Loading

Sample Evaluation

Evaluation Dataset.csv

Evaluation Dataset Results.csv

Explanation

svandenhoven left a comment

Choose a reason for hiding this comment

tracyboehrer commented Dec 19, 2024

TechPreacher commented Dec 19, 2024

TechPreacher commented Dec 19, 2024 • edited Loading

TechPreacher commented Dec 17, 2024 •

edited

Loading

TechPreacher commented Dec 19, 2024 •

edited

Loading