AI-102 Study Series Exercise 23: Speech Translation with Azure AI Speech

Overview

This exercise demonstrates how to translate speech using Azure AI Speech, enabling users to convert spoken language into translated text and synthesized speech.

Steps & Configuration Details

1. Provision an Azure AI Speech Resource

Open Azure Portal (https://portal.azure.com) and sign in.
Search for Azure AI services → Select Create under Speech service.
Configuration Items:
- Subscription: Your Azure subscription.
- Resource Group: Select or create a resource group.
- Region: Choose any available region.
- Name: Enter a unique name.
- Pricing Tier: F0 (Free) or S (Standard).
- Responsible AI Notice: Agree.

After provisioning, navigate to Keys and Endpoint in the Resource Management section.

2. Clone the Repository

Open Azure Cloud Shell in the Azure Portal.
Select PowerShell as the environment.

Run the following commands:

rm -r mslearn-ai-language -f
git clone https://github.com/MicrosoftLearning/mslearn-ai-language mslearn-ai-language
cd mslearn-ai-language/Labfiles/08-speech-translation

3. Configure Your Application

Navigate to the correct folder:

cd CSharp/translator  # For C#
cd Python/translator   # For Python

Install dependencies:

C#:

dotnet add package Microsoft.CognitiveServices.Speech --version 1.30.0

Python:

pip install azure-cognitiveservices-speech==1.30.0

Open the configuration file:
- C#: appsettings.json
- Python: .env
Update Configuration Values:
- Azure AI Speech Region
- API Key
Save the configuration file.

4. Add Code to Use Azure AI Speech SDK

Open the code file:
- C#: Program.cs
- Python: translator.py

Add references:

C#:

using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Microsoft.CognitiveServices.Speech.Translation;

Python:

import azure.cognitiveservices.speech as speech_sdk

Configure the AI Speech client:

C#:

translationConfig = SpeechTranslationConfig.FromSubscription(aiSvcKey, aiSvcRegion);
translationConfig.SpeechRecognitionLanguage = "en-US";
translationConfig.AddTargetLanguage("fr");
translationConfig.AddTargetLanguage("es");
translationConfig.AddTargetLanguage("hi");
Console.WriteLine("Ready to translate from " + translationConfig.SpeechRecognitionLanguage);

Python:

translation_config = speech_sdk.translation.SpeechTranslationConfig(ai_key, ai_region)
translation_config.speech_recognition_language = 'en-US'
translation_config.add_target_language('fr')
translation_config.add_target_language('es')
translation_config.add_target_language('hi')
print('Ready to translate from', translation_config.speech_recognition_language)

5. Implement Speech Translation

C#:

using AudioConfig audioConfig = AudioConfig.FromDefaultMicrophoneInput();
using TranslationRecognizer translator = new TranslationRecognizer(translationConfig, audioConfig);
Console.WriteLine("Speak now...");
TranslationRecognitionResult result = await translator.RecognizeOnceAsync();
Console.WriteLine($"Translating '{result.Text}'");
translation = result.Translations[targetLanguage];
Console.OutputEncoding = Encoding.UTF8;
Console.WriteLine(translation);

Python:

audio_config = speech_sdk.AudioConfig(use_default_microphone=True)
translator = speech_sdk.translation.TranslationRecognizer(translation_config, audio_config=audio_config)
print("Speak now...")
result = translator.recognize_once_async().get()
print('Translating "{}"'.format(result.text))
translation = result.translations[targetLanguage]
print(translation)

6. Synthesize Translated Speech

C#:

var voices = new Dictionary<string, string> {
    ["fr"] = "fr-FR-HenriNeural",
    ["es"] = "es-ES-ElviraNeural",
    ["hi"] = "hi-IN-MadhurNeural"
};
speechConfig.SpeechSynthesisVoiceName = voices[targetLanguage];
using SpeechSynthesizer speechSynthesizer = new SpeechSynthesizer(speechConfig);
SpeechSynthesisResult speak = await speechSynthesizer.SpeakTextAsync(translation);
if (speak.Reason != ResultReason.SynthesizingAudioCompleted) {
    Console.WriteLine(speak.Reason);
}

Python:

voices = {
    "fr": "fr-FR-HenriNeural",
    "es": "es-ES-ElviraNeural",
    "hi": "hi-IN-MadhurNeural"
}
speech_config.speech_synthesis_voice_name = voices.get(targetLanguage)
speech_synthesizer = speech_sdk.SpeechSynthesizer(speech_config)
speak = speech_synthesizer.speak_text_async(translation).get()
if speak.reason != speech_sdk.ResultReason.SynthesizingAudioCompleted:
    print(speak.reason)

7. Run Your Application

C#:
```
dotnet run
```
Python:
```
python translator.py
```
Example prompt:
```
Where is the station?
```
The response should display the transcribed speech, translated text, and synthesized speech output.

8. Clean Up

Delete Azure resources to avoid unnecessary costs:
- Open Azure Portal (https://portal.azure.com).
- Navigate to Resource Groups.
- Select the resource group and click Delete.

Overview#

Steps & Configuration Details#

1. Provision an Azure AI Speech Resource#

2. Clone the Repository#

3. Configure Your Application#

4. Add Code to Use Azure AI Speech SDK#

5. Implement Speech Translation#

6. Synthesize Translated Speech#

7. Run Your Application#

8. Clean Up#

Related Posts

Overview

Steps & Configuration Details

1. Provision an Azure AI Speech Resource

2. Clone the Repository

3. Configure Your Application

4. Add Code to Use Azure AI Speech SDK

5. Implement Speech Translation

6. Synthesize Translated Speech

7. Run Your Application

8. Clean Up