AI-102 Study Series Exercise 22: Speech Recognition and Synthesis with Azure AI Speech

Overview

This exercise demonstrates how to recognize and synthesize speech using Azure AI Speech. The solution enables users to convert spoken words into text (speech-to-text) and generate audible speech from text (text-to-speech).

Steps & Configuration Details

1. Provision an Azure AI Speech Resource

Open Azure Portal (https://portal.azure.com) and sign in.
Search for Azure AI services → Select Create under Speech service.
Configuration Items:
- Subscription: Your Azure subscription.
- Resource Group: Select or create a resource group.
- Region: Choose any available region.
- Name: Enter a unique name.
- Pricing Tier: F0 (Free) or S (Standard).
- Responsible AI Notice: Agree.

After provisioning, navigate to Keys and Endpoint in the Resource Management section.

2. Clone the Repository

Open Azure Cloud Shell in the Azure Portal.
Select PowerShell as the environment.

Run the following commands:

rm -r mslearn-ai-language -f
git clone https://github.com/MicrosoftLearning/mslearn-ai-language mslearn-ai-language
cd mslearn-ai-language/Labfiles/07-speech

3. Configure Your Application

Navigate to the correct folder:

cd CSharp/speaking-clock  # For C#
cd Python/speaking-clock   # For Python

Install dependencies:

C#:

dotnet add package Microsoft.CognitiveServices.Speech --version 1.30.0

Python:

pip install azure-cognitiveservices-speech==1.30.0

Open the configuration file:
- C#: appsettings.json
- Python: .env
Update Configuration Values:
- Azure AI Speech Region
- API Key
Save the configuration file.

4. Add Code to Use Azure AI Speech SDK

Open the code file:
- C#: Program.cs
- Python: speaking-clock.py

Add references:

C#:

using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

Python:

import azure.cognitiveservices.speech as speech_sdk

Configure the AI Speech client:

C#:

speechConfig = SpeechConfig.FromSubscription(aiSvcKey, aiSvcRegion);
Console.WriteLine($"Ready to use speech service in {speechConfig.Region}");
speechConfig.SpeechSynthesisVoiceName = "en-US-AriaNeural";

Python:

speech_config = speech_sdk.SpeechConfig(ai_key, ai_region)
print(f'Ready to use speech service in: {speech_config.region}')

5. Add Code to Recognize Speech

C#:

using AudioConfig audioConfig = AudioConfig.FromDefaultMicrophoneInput();
using SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
Console.WriteLine("Speak now...");
SpeechRecognitionResult speech = await speechRecognizer.RecognizeOnceAsync();
if (speech.Reason == ResultReason.RecognizedSpeech) {
    command = speech.Text;
    Console.WriteLine(command);
}

Python:

audio_config = speech_sdk.AudioConfig(use_default_microphone=True)
speech_recognizer = speech_sdk.SpeechRecognizer(speech_config, audio_config)
print('Speak now...')
speech = speech_recognizer.recognize_once_async().get()
if speech.reason == speech_sdk.ResultReason.RecognizedSpeech:
    command = speech.text
    print(command)

6. Add Code to Synthesize Speech

C#:

speechConfig.SpeechSynthesisVoiceName = "en-GB-RyanNeural";
using SpeechSynthesizer speechSynthesizer = new SpeechSynthesizer(speechConfig);
SpeechSynthesisResult speak = await speechSynthesizer.SpeakTextAsync(responseText);
if (speak.Reason != ResultReason.SynthesizingAudioCompleted) {
    Console.WriteLine(speak.Reason);
}

Python:

speech_config.speech_synthesis_voice_name = "en-GB-RyanNeural"
speech_synthesizer = speech_sdk.SpeechSynthesizer(speech_config)
speak = speech_synthesizer.speak_text_async(response_text).get()
if speak.reason != speech_sdk.ResultReason.SynthesizingAudioCompleted:
    print(speak.reason)

7. Run Your Application

C#:
```
dotnet run
```
Python:
```
python speaking-clock.py
```
Example prompt:
```
What time is it?
```
The response should display the transcribed speech and generate spoken output.

8. Clean Up

Delete Azure resources to avoid unnecessary costs:
- Open Azure Portal (https://portal.azure.com).
- Navigate to Resource Groups.
- Select the resource group and click Delete.

This summary captures the essential steps while highlighting all configuration items and code references required for recognizing and synthesizing speech using Azure AI Speech.

Overview#

Steps & Configuration Details#

1. Provision an Azure AI Speech Resource#

2. Clone the Repository#

3. Configure Your Application#

4. Add Code to Use Azure AI Speech SDK#

5. Add Code to Recognize Speech#

6. Add Code to Synthesize Speech#

7. Run Your Application#

8. Clean Up#

Related Posts

Overview

Steps & Configuration Details

1. Provision an Azure AI Speech Resource

2. Clone the Repository

3. Configure Your Application

4. Add Code to Use Azure AI Speech SDK

5. Add Code to Recognize Speech

6. Add Code to Synthesize Speech

7. Run Your Application

8. Clean Up