Overview

This exercise demonstrates how to evaluate generative AI model performance using manual and automated evaluations in Azure AI Foundry. The goal is to assess model responses based on predefined criteria.


Steps & Configuration Details

1. Create an Azure AI Foundry Hub and Project

  • Open Azure AI Foundry portal (https://ai.azure.com) and sign in.
  • Navigate to Management CenterAll ResourcesCreateAI Hub Resource.
  • Configuration Items:
    • Subscription: Your Azure subscription.
    • Resource Group: Select or create a resource group.
    • Hub Name: A valid name.
    • Location: Choose from:
      • East US 2
      • France Central
      • UK South
      • Sweden Central (quota limits may require a different region).

2. Deploy Models

Two models are required:

  1. Primary Model: gpt-4o
  2. Evaluation Model: gpt-4o-mini

Deployment Settings:

  • Deployment Name: A valid name.
  • Deployment Type: Global Standard
  • Automatic Version Update: Enabled
  • Model Version: Latest available.
  • Connected AI Resource: Link to Azure AI Foundry project.
  • Tokens Per Minute (TPM) Limit: 50K (or max available).
  • Content Filter: DefaultV2

3. Perform Manual Evaluation

  • Download the evaluation dataset:
    https://raw.githubusercontent.com/MicrosoftLearning/mslearn-ai-studio/refs/heads/main/data/travel_evaluation_data.jsonl
    
  • Save the file as JSONL (ensure it’s not saved as .txt).
  • Navigate to Protect and GovernEvaluation.
  • Select + New Manual Evaluation.
  • Configuration Items:
    • Model: gpt-4o
    • System Message:
      Assist users with travel-related inquiries, offering tips, advice, and recommendations as a knowledgeable travel agent.
      
    • Dataset Mapping:
      • Input: Question
      • Expected Response: ExpectedResponse
  • Click Run to generate model responses.
  • Review outputs and score responses using thumbs-up/down icons.
  • Save results for future comparison.

4. Perform Automated Evaluation

  • Navigate to EvaluationAutomated EvaluationsCreate a New Evaluation.
  • Select Evaluate a ModelUse Your Dataset → Choose the uploaded JSONL dataset.
  • Configuration Items:
    • Model: gpt-4o-mini
    • System Message: Same as manual evaluation.
    • Query Field: {{item.question}}
  • Evaluators Configuration:
    • Model Scorer:
      Criteria Name: Semantic_similarity
      Grade With: gpt-4o
      Output: {{sample.output_text}}
      Ground Truth: {{item.ExpectedResponse}}
      
    • Likert-Scale Evaluator:
      Criteria Name: Relevance
      Grade With: gpt-4o
      Query: {{item.question}}
      
    • Text Similarity:
      Criteria Name: F1_Score
      Ground Truth: {{item.ExpectedResponse}}
      
    • Hateful and Unfair Content:
      Criteria Name: Hate_and_unfairness
      Query: {{item.question}}
      
  • Submit the evaluation and wait for completion.
  • Review results and access raw evaluation data.

5. Clean Up

  • Delete Azure resources to avoid unnecessary costs:

This summary captures the essential steps while highlighting all configuration items and code references required for evaluating generative AI models in Azure AI Foundry.

Related Posts