Overview
This exercise demonstrates how to translate speech using Azure AI Speech, enabling users to convert spoken language into translated text and synthesized speech.
Steps & Configuration Details
1. Provision an Azure AI Speech Resource
- Open Azure Portal (https://portal.azure.com) and sign in.
- Search for Azure AI services → Select Create under Speech service.
- Configuration Items:
- Subscription: Your Azure subscription.
- Resource Group: Select or create a resource group.
- Region: Choose any available region.
- Name: Enter a unique name.
- Pricing Tier:
F0 (Free)
orS (Standard)
. - Responsible AI Notice: Agree.
After provisioning, navigate to Keys and Endpoint in the Resource Management section.
2. Clone the Repository
- Open Azure Cloud Shell in the Azure Portal.
- Select PowerShell as the environment.
- Run the following commands:
rm -r mslearn-ai-language -f git clone https://github.com/MicrosoftLearning/mslearn-ai-language mslearn-ai-language cd mslearn-ai-language/Labfiles/08-speech-translation
3. Configure Your Application
- Navigate to the correct folder:
cd CSharp/translator # For C# cd Python/translator # For Python
- Install dependencies:
- C#:
dotnet add package Microsoft.CognitiveServices.Speech --version 1.30.0
- Python:
pip install azure-cognitiveservices-speech==1.30.0
- C#:
- Open the configuration file:
- C#:
appsettings.json
- Python:
.env
- C#:
- Update Configuration Values:
- Azure AI Speech Region
- API Key
- Save the configuration file.
4. Add Code to Use Azure AI Speech SDK
- Open the code file:
- C#:
Program.cs
- Python:
translator.py
- C#:
- Add references:
- C#:
using Microsoft.CognitiveServices.Speech; using Microsoft.CognitiveServices.Speech.Audio; using Microsoft.CognitiveServices.Speech.Translation;
- Python:
import azure.cognitiveservices.speech as speech_sdk
- C#:
- Configure the AI Speech client:
- C#:
translationConfig = SpeechTranslationConfig.FromSubscription(aiSvcKey, aiSvcRegion); translationConfig.SpeechRecognitionLanguage = "en-US"; translationConfig.AddTargetLanguage("fr"); translationConfig.AddTargetLanguage("es"); translationConfig.AddTargetLanguage("hi"); Console.WriteLine("Ready to translate from " + translationConfig.SpeechRecognitionLanguage);
- Python:
translation_config = speech_sdk.translation.SpeechTranslationConfig(ai_key, ai_region) translation_config.speech_recognition_language = 'en-US' translation_config.add_target_language('fr') translation_config.add_target_language('es') translation_config.add_target_language('hi') print('Ready to translate from', translation_config.speech_recognition_language)
- C#:
5. Implement Speech Translation
- C#:
using AudioConfig audioConfig = AudioConfig.FromDefaultMicrophoneInput(); using TranslationRecognizer translator = new TranslationRecognizer(translationConfig, audioConfig); Console.WriteLine("Speak now..."); TranslationRecognitionResult result = await translator.RecognizeOnceAsync(); Console.WriteLine($"Translating '{result.Text}'"); translation = result.Translations[targetLanguage]; Console.OutputEncoding = Encoding.UTF8; Console.WriteLine(translation);
- Python:
audio_config = speech_sdk.AudioConfig(use_default_microphone=True) translator = speech_sdk.translation.TranslationRecognizer(translation_config, audio_config=audio_config) print("Speak now...") result = translator.recognize_once_async().get() print('Translating "{}"'.format(result.text)) translation = result.translations[targetLanguage] print(translation)
6. Synthesize Translated Speech
- C#:
var voices = new Dictionary<string, string> { ["fr"] = "fr-FR-HenriNeural", ["es"] = "es-ES-ElviraNeural", ["hi"] = "hi-IN-MadhurNeural" }; speechConfig.SpeechSynthesisVoiceName = voices[targetLanguage]; using SpeechSynthesizer speechSynthesizer = new SpeechSynthesizer(speechConfig); SpeechSynthesisResult speak = await speechSynthesizer.SpeakTextAsync(translation); if (speak.Reason != ResultReason.SynthesizingAudioCompleted) { Console.WriteLine(speak.Reason); }
- Python:
voices = { "fr": "fr-FR-HenriNeural", "es": "es-ES-ElviraNeural", "hi": "hi-IN-MadhurNeural" } speech_config.speech_synthesis_voice_name = voices.get(targetLanguage) speech_synthesizer = speech_sdk.SpeechSynthesizer(speech_config) speak = speech_synthesizer.speak_text_async(translation).get() if speak.reason != speech_sdk.ResultReason.SynthesizingAudioCompleted: print(speak.reason)
7. Run Your Application
- C#:
dotnet run
- Python:
python translator.py
- Example prompt:
Where is the station?
- The response should display the transcribed speech, translated text, and synthesized speech output.
8. Clean Up
- Delete Azure resources to avoid unnecessary costs:
- Open Azure Portal (https://portal.azure.com).
- Navigate to Resource Groups.
- Select the resource group and click Delete.