Overview

This exercise demonstrates how to translate speech using Azure AI Speech, enabling users to convert spoken language into translated text and synthesized speech.


Steps & Configuration Details

1. Provision an Azure AI Speech Resource

  • Open Azure Portal (https://portal.azure.com) and sign in.
  • Search for Azure AI services → Select Create under Speech service.
  • Configuration Items:
    • Subscription: Your Azure subscription.
    • Resource Group: Select or create a resource group.
    • Region: Choose any available region.
    • Name: Enter a unique name.
    • Pricing Tier: F0 (Free) or S (Standard).
    • Responsible AI Notice: Agree.

After provisioning, navigate to Keys and Endpoint in the Resource Management section.


2. Clone the Repository

  • Open Azure Cloud Shell in the Azure Portal.
  • Select PowerShell as the environment.
  • Run the following commands:
    rm -r mslearn-ai-language -f
    git clone https://github.com/MicrosoftLearning/mslearn-ai-language mslearn-ai-language
    cd mslearn-ai-language/Labfiles/08-speech-translation
    

3. Configure Your Application

  • Navigate to the correct folder:
    cd CSharp/translator  # For C#
    cd Python/translator   # For Python
    
  • Install dependencies:
    • C#:
      dotnet add package Microsoft.CognitiveServices.Speech --version 1.30.0
      
    • Python:
      pip install azure-cognitiveservices-speech==1.30.0
      
  • Open the configuration file:
    • C#: appsettings.json
    • Python: .env
  • Update Configuration Values:
    • Azure AI Speech Region
    • API Key
  • Save the configuration file.

4. Add Code to Use Azure AI Speech SDK

  • Open the code file:
    • C#: Program.cs
    • Python: translator.py
  • Add references:
    • C#:
      using Microsoft.CognitiveServices.Speech;
      using Microsoft.CognitiveServices.Speech.Audio;
      using Microsoft.CognitiveServices.Speech.Translation;
      
    • Python:
      import azure.cognitiveservices.speech as speech_sdk
      
  • Configure the AI Speech client:
    • C#:
      translationConfig = SpeechTranslationConfig.FromSubscription(aiSvcKey, aiSvcRegion);
      translationConfig.SpeechRecognitionLanguage = "en-US";
      translationConfig.AddTargetLanguage("fr");
      translationConfig.AddTargetLanguage("es");
      translationConfig.AddTargetLanguage("hi");
      Console.WriteLine("Ready to translate from " + translationConfig.SpeechRecognitionLanguage);
      
    • Python:
      translation_config = speech_sdk.translation.SpeechTranslationConfig(ai_key, ai_region)
      translation_config.speech_recognition_language = 'en-US'
      translation_config.add_target_language('fr')
      translation_config.add_target_language('es')
      translation_config.add_target_language('hi')
      print('Ready to translate from', translation_config.speech_recognition_language)
      

5. Implement Speech Translation

  • C#:
    using AudioConfig audioConfig = AudioConfig.FromDefaultMicrophoneInput();
    using TranslationRecognizer translator = new TranslationRecognizer(translationConfig, audioConfig);
    Console.WriteLine("Speak now...");
    TranslationRecognitionResult result = await translator.RecognizeOnceAsync();
    Console.WriteLine($"Translating '{result.Text}'");
    translation = result.Translations[targetLanguage];
    Console.OutputEncoding = Encoding.UTF8;
    Console.WriteLine(translation);
    
  • Python:
    audio_config = speech_sdk.AudioConfig(use_default_microphone=True)
    translator = speech_sdk.translation.TranslationRecognizer(translation_config, audio_config=audio_config)
    print("Speak now...")
    result = translator.recognize_once_async().get()
    print('Translating "{}"'.format(result.text))
    translation = result.translations[targetLanguage]
    print(translation)
    

6. Synthesize Translated Speech

  • C#:
    var voices = new Dictionary<string, string> {
        ["fr"] = "fr-FR-HenriNeural",
        ["es"] = "es-ES-ElviraNeural",
        ["hi"] = "hi-IN-MadhurNeural"
    };
    speechConfig.SpeechSynthesisVoiceName = voices[targetLanguage];
    using SpeechSynthesizer speechSynthesizer = new SpeechSynthesizer(speechConfig);
    SpeechSynthesisResult speak = await speechSynthesizer.SpeakTextAsync(translation);
    if (speak.Reason != ResultReason.SynthesizingAudioCompleted) {
        Console.WriteLine(speak.Reason);
    }
    
  • Python:
    voices = {
        "fr": "fr-FR-HenriNeural",
        "es": "es-ES-ElviraNeural",
        "hi": "hi-IN-MadhurNeural"
    }
    speech_config.speech_synthesis_voice_name = voices.get(targetLanguage)
    speech_synthesizer = speech_sdk.SpeechSynthesizer(speech_config)
    speak = speech_synthesizer.speak_text_async(translation).get()
    if speak.reason != speech_sdk.ResultReason.SynthesizingAudioCompleted:
        print(speak.reason)
    

7. Run Your Application

  • C#:
    dotnet run
    
  • Python:
    python translator.py
    
  • Example prompt:
    Where is the station?
    
  • The response should display the transcribed speech, translated text, and synthesized speech output.

8. Clean Up

  • Delete Azure resources to avoid unnecessary costs:

Related Posts