AI-102 Study Series Exercise 7: Model Evaluation in Azure AI Foundry

Overview

This exercise demonstrates how to evaluate generative AI model performance using manual and automated evaluations in Azure AI Foundry. The goal is to assess model responses based on predefined criteria.

Steps & Configuration Details

1. Create an Azure AI Foundry Hub and Project

Open Azure AI Foundry portal (https://ai.azure.com) and sign in.
Navigate to Management Center → All Resources → Create → AI Hub Resource.
Configuration Items:
- Subscription: Your Azure subscription.
- Resource Group: Select or create a resource group.
- Hub Name: A valid name.
- Location: Choose from:
  - East US 2
  - France Central
  - UK South
  - Sweden Central (quota limits may require a different region).

2. Deploy Models

Two models are required:

Primary Model: gpt-4o
Evaluation Model: gpt-4o-mini

Deployment Settings:

Deployment Name: A valid name.
Deployment Type: Global Standard
Automatic Version Update: Enabled
Model Version: Latest available.
Connected AI Resource: Link to Azure AI Foundry project.
Tokens Per Minute (TPM) Limit: 50K (or max available).
Content Filter: DefaultV2

3. Perform Manual Evaluation

Download the evaluation dataset:

https://raw.githubusercontent.com/MicrosoftLearning/mslearn-ai-studio/refs/heads/main/data/travel_evaluation_data.jsonl

Save the file as JSONL (ensure it’s not saved as .txt).
Navigate to Protect and Govern → Evaluation.
Select + New Manual Evaluation.

Configuration Items:

Model: gpt-4o

System Message:

Assist users with travel-related inquiries, offering tips, advice, and recommendations as a knowledgeable travel agent.

Dataset Mapping:
- Input: Question
- Expected Response: ExpectedResponse

Click Run to generate model responses.
Review outputs and score responses using thumbs-up/down icons.
Save results for future comparison.

4. Perform Automated Evaluation

Navigate to Evaluation → Automated Evaluations → Create a New Evaluation.
Select Evaluate a Model → Use Your Dataset → Choose the uploaded JSONL dataset.
Configuration Items:
- Model: gpt-4o-mini
- System Message: Same as manual evaluation.
- Query Field: {{item.question}}

Evaluators Configuration:

Model Scorer:

Criteria Name: Semantic_similarity
Grade With: gpt-4o
Output: {{sample.output_text}}
Ground Truth: {{item.ExpectedResponse}}

Likert-Scale Evaluator:

Criteria Name: Relevance
Grade With: gpt-4o
Query: {{item.question}}

Text Similarity:

Criteria Name: F1_Score
Ground Truth: {{item.ExpectedResponse}}

Hateful and Unfair Content:

Criteria Name: Hate_and_unfairness
Query: {{item.question}}

Submit the evaluation and wait for completion.
Review results and access raw evaluation data.

5. Clean Up

Delete Azure resources to avoid unnecessary costs:
- Open Azure Portal (https://portal.azure.com).
- Navigate to Resource Groups.
- Select the resource group and click Delete.

This summary captures the essential steps while highlighting all configuration items and code references required for evaluating generative AI models in Azure AI Foundry.

Overview#

Steps & Configuration Details#

1. Create an Azure AI Foundry Hub and Project#

2. Deploy Models#

3. Perform Manual Evaluation#

4. Perform Automated Evaluation#

5. Clean Up#

Related Posts