Overview

This exercise demonstrates how to recognize and synthesize speech using Azure AI Speech. The solution enables users to convert spoken words into text (speech-to-text) and generate audible speech from text (text-to-speech).


Steps & Configuration Details

1. Provision an Azure AI Speech Resource

  • Open Azure Portal (https://portal.azure.com) and sign in.
  • Search for Azure AI services → Select Create under Speech service.
  • Configuration Items:
    • Subscription: Your Azure subscription.
    • Resource Group: Select or create a resource group.
    • Region: Choose any available region.
    • Name: Enter a unique name.
    • Pricing Tier: F0 (Free) or S (Standard).
    • Responsible AI Notice: Agree.

After provisioning, navigate to Keys and Endpoint in the Resource Management section.


2. Clone the Repository

  • Open Azure Cloud Shell in the Azure Portal.
  • Select PowerShell as the environment.
  • Run the following commands:
    rm -r mslearn-ai-language -f
    git clone https://github.com/MicrosoftLearning/mslearn-ai-language mslearn-ai-language
    cd mslearn-ai-language/Labfiles/07-speech
    

3. Configure Your Application

  • Navigate to the correct folder:
    cd CSharp/speaking-clock  # For C#
    cd Python/speaking-clock   # For Python
    
  • Install dependencies:
    • C#:
      dotnet add package Microsoft.CognitiveServices.Speech --version 1.30.0
      
    • Python:
      pip install azure-cognitiveservices-speech==1.30.0
      
  • Open the configuration file:
    • C#: appsettings.json
    • Python: .env
  • Update Configuration Values:
    • Azure AI Speech Region
    • API Key
  • Save the configuration file.

4. Add Code to Use Azure AI Speech SDK

  • Open the code file:
    • C#: Program.cs
    • Python: speaking-clock.py
  • Add references:
    • C#:
      using Microsoft.CognitiveServices.Speech;
      using Microsoft.CognitiveServices.Speech.Audio;
      
    • Python:
      import azure.cognitiveservices.speech as speech_sdk
      
  • Configure the AI Speech client:
    • C#:
      speechConfig = SpeechConfig.FromSubscription(aiSvcKey, aiSvcRegion);
      Console.WriteLine($"Ready to use speech service in {speechConfig.Region}");
      speechConfig.SpeechSynthesisVoiceName = "en-US-AriaNeural";
      
    • Python:
      speech_config = speech_sdk.SpeechConfig(ai_key, ai_region)
      print(f'Ready to use speech service in: {speech_config.region}')
      

5. Add Code to Recognize Speech

  • C#:
    using AudioConfig audioConfig = AudioConfig.FromDefaultMicrophoneInput();
    using SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
    Console.WriteLine("Speak now...");
    SpeechRecognitionResult speech = await speechRecognizer.RecognizeOnceAsync();
    if (speech.Reason == ResultReason.RecognizedSpeech) {
        command = speech.Text;
        Console.WriteLine(command);
    }
    
  • Python:
    audio_config = speech_sdk.AudioConfig(use_default_microphone=True)
    speech_recognizer = speech_sdk.SpeechRecognizer(speech_config, audio_config)
    print('Speak now...')
    speech = speech_recognizer.recognize_once_async().get()
    if speech.reason == speech_sdk.ResultReason.RecognizedSpeech:
        command = speech.text
        print(command)
    

6. Add Code to Synthesize Speech

  • C#:
    speechConfig.SpeechSynthesisVoiceName = "en-GB-RyanNeural";
    using SpeechSynthesizer speechSynthesizer = new SpeechSynthesizer(speechConfig);
    SpeechSynthesisResult speak = await speechSynthesizer.SpeakTextAsync(responseText);
    if (speak.Reason != ResultReason.SynthesizingAudioCompleted) {
        Console.WriteLine(speak.Reason);
    }
    
  • Python:
    speech_config.speech_synthesis_voice_name = "en-GB-RyanNeural"
    speech_synthesizer = speech_sdk.SpeechSynthesizer(speech_config)
    speak = speech_synthesizer.speak_text_async(response_text).get()
    if speak.reason != speech_sdk.ResultReason.SynthesizingAudioCompleted:
        print(speak.reason)
    

7. Run Your Application

  • C#:
    dotnet run
    
  • Python:
    python speaking-clock.py
    
  • Example prompt:
    What time is it?
    
  • The response should display the transcribed speech and generate spoken output.

8. Clean Up

  • Delete Azure resources to avoid unnecessary costs:

This summary captures the essential steps while highlighting all configuration items and code references required for recognizing and synthesizing speech using Azure AI Speech.

Related Posts