Overview
This exercise demonstrates how to evaluate generative AI model performance using manual and automated evaluations in Azure AI Foundry. The goal is to assess model responses based on predefined criteria.
Steps & Configuration Details
1. Create an Azure AI Foundry Hub and Project
- Open Azure AI Foundry portal (https://ai.azure.com) and sign in.
- Navigate to Management Center → All Resources → Create → AI Hub Resource.
- Configuration Items:
- Subscription: Your Azure subscription.
- Resource Group: Select or create a resource group.
- Hub Name: A valid name.
- Location: Choose from:
- East US 2
- France Central
- UK South
- Sweden Central (quota limits may require a different region).
2. Deploy Models
Two models are required:
- Primary Model:
gpt-4o
- Evaluation Model:
gpt-4o-mini
Deployment Settings:
- Deployment Name: A valid name.
- Deployment Type:
Global Standard
- Automatic Version Update:
Enabled
- Model Version: Latest available.
- Connected AI Resource: Link to Azure AI Foundry project.
- Tokens Per Minute (TPM) Limit:
50K
(or max available). - Content Filter:
DefaultV2
3. Perform Manual Evaluation
- Download the evaluation dataset:
https://raw.githubusercontent.com/MicrosoftLearning/mslearn-ai-studio/refs/heads/main/data/travel_evaluation_data.jsonl
- Save the file as JSONL (ensure it’s not saved as
.txt
). - Navigate to Protect and Govern → Evaluation.
- Select + New Manual Evaluation.
- Configuration Items:
- Model:
gpt-4o
- System Message:
Assist users with travel-related inquiries, offering tips, advice, and recommendations as a knowledgeable travel agent.
- Dataset Mapping:
- Input:
Question
- Expected Response:
ExpectedResponse
- Input:
- Model:
- Click Run to generate model responses.
- Review outputs and score responses using thumbs-up/down icons.
- Save results for future comparison.
4. Perform Automated Evaluation
- Navigate to Evaluation → Automated Evaluations → Create a New Evaluation.
- Select Evaluate a Model → Use Your Dataset → Choose the uploaded JSONL dataset.
- Configuration Items:
- Model:
gpt-4o-mini
- System Message: Same as manual evaluation.
- Query Field:
{{item.question}}
- Model:
- Evaluators Configuration:
- Model Scorer:
Criteria Name: Semantic_similarity Grade With: gpt-4o Output: {{sample.output_text}} Ground Truth: {{item.ExpectedResponse}}
- Likert-Scale Evaluator:
Criteria Name: Relevance Grade With: gpt-4o Query: {{item.question}}
- Text Similarity:
Criteria Name: F1_Score Ground Truth: {{item.ExpectedResponse}}
- Hateful and Unfair Content:
Criteria Name: Hate_and_unfairness Query: {{item.question}}
- Model Scorer:
- Submit the evaluation and wait for completion.
- Review results and access raw evaluation data.
5. Clean Up
- Delete Azure resources to avoid unnecessary costs:
- Open Azure Portal (https://portal.azure.com).
- Navigate to Resource Groups.
- Select the resource group and click Delete.
This summary captures the essential steps while highlighting all configuration items and code references required for evaluating generative AI models in Azure AI Foundry.