Overview
This exercise demonstrates how to recognize and synthesize speech using Azure AI Speech. The solution enables users to convert spoken words into text (speech-to-text) and generate audible speech from text (text-to-speech).
Steps & Configuration Details
1. Provision an Azure AI Speech Resource
- Open Azure Portal (https://portal.azure.com) and sign in.
- Search for Azure AI services → Select Create under Speech service.
- Configuration Items:
- Subscription: Your Azure subscription.
- Resource Group: Select or create a resource group.
- Region: Choose any available region.
- Name: Enter a unique name.
- Pricing Tier:
F0 (Free)
orS (Standard)
. - Responsible AI Notice: Agree.
After provisioning, navigate to Keys and Endpoint in the Resource Management section.
2. Clone the Repository
- Open Azure Cloud Shell in the Azure Portal.
- Select PowerShell as the environment.
- Run the following commands:
rm -r mslearn-ai-language -f git clone https://github.com/MicrosoftLearning/mslearn-ai-language mslearn-ai-language cd mslearn-ai-language/Labfiles/07-speech
3. Configure Your Application
- Navigate to the correct folder:
cd CSharp/speaking-clock # For C# cd Python/speaking-clock # For Python
- Install dependencies:
- C#:
dotnet add package Microsoft.CognitiveServices.Speech --version 1.30.0
- Python:
pip install azure-cognitiveservices-speech==1.30.0
- C#:
- Open the configuration file:
- C#:
appsettings.json
- Python:
.env
- C#:
- Update Configuration Values:
- Azure AI Speech Region
- API Key
- Save the configuration file.
4. Add Code to Use Azure AI Speech SDK
- Open the code file:
- C#:
Program.cs
- Python:
speaking-clock.py
- C#:
- Add references:
- C#:
using Microsoft.CognitiveServices.Speech; using Microsoft.CognitiveServices.Speech.Audio;
- Python:
import azure.cognitiveservices.speech as speech_sdk
- C#:
- Configure the AI Speech client:
- C#:
speechConfig = SpeechConfig.FromSubscription(aiSvcKey, aiSvcRegion); Console.WriteLine($"Ready to use speech service in {speechConfig.Region}"); speechConfig.SpeechSynthesisVoiceName = "en-US-AriaNeural";
- Python:
speech_config = speech_sdk.SpeechConfig(ai_key, ai_region) print(f'Ready to use speech service in: {speech_config.region}')
- C#:
5. Add Code to Recognize Speech
- C#:
using AudioConfig audioConfig = AudioConfig.FromDefaultMicrophoneInput(); using SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig); Console.WriteLine("Speak now..."); SpeechRecognitionResult speech = await speechRecognizer.RecognizeOnceAsync(); if (speech.Reason == ResultReason.RecognizedSpeech) { command = speech.Text; Console.WriteLine(command); }
- Python:
audio_config = speech_sdk.AudioConfig(use_default_microphone=True) speech_recognizer = speech_sdk.SpeechRecognizer(speech_config, audio_config) print('Speak now...') speech = speech_recognizer.recognize_once_async().get() if speech.reason == speech_sdk.ResultReason.RecognizedSpeech: command = speech.text print(command)
6. Add Code to Synthesize Speech
- C#:
speechConfig.SpeechSynthesisVoiceName = "en-GB-RyanNeural"; using SpeechSynthesizer speechSynthesizer = new SpeechSynthesizer(speechConfig); SpeechSynthesisResult speak = await speechSynthesizer.SpeakTextAsync(responseText); if (speak.Reason != ResultReason.SynthesizingAudioCompleted) { Console.WriteLine(speak.Reason); }
- Python:
speech_config.speech_synthesis_voice_name = "en-GB-RyanNeural" speech_synthesizer = speech_sdk.SpeechSynthesizer(speech_config) speak = speech_synthesizer.speak_text_async(response_text).get() if speak.reason != speech_sdk.ResultReason.SynthesizingAudioCompleted: print(speak.reason)
7. Run Your Application
- C#:
dotnet run
- Python:
python speaking-clock.py
- Example prompt:
What time is it?
- The response should display the transcribed speech and generate spoken output.
8. Clean Up
- Delete Azure resources to avoid unnecessary costs:
- Open Azure Portal (https://portal.azure.com).
- Navigate to Resource Groups.
- Select the resource group and click Delete.
This summary captures the essential steps while highlighting all configuration items and code references required for recognizing and synthesizing speech using Azure AI Speech.