Recognize and synthesize speech Flashcards

Question 1

Q

Microsoft Azure offers both speech recognition and speech synthesis capabilities through the Speech cognitive service, which includes the following application programming interfaces (APIs):

Answer

A

The Speech-to-Text API
The Text-to-Speech API

Question 2

Q

Azure resources for the Speech service

Answer

A

A Speech resource - choose this resource type if you only plan to use the Speech service, or if you want to manage access and billing for the resource separately from other services.
A Cognitive Services resource - choose this resource type if you plan to use the Speech service in combination with other cognitive services, and you want to manage access and billing for these services together.

Question 3

Q

The speech-to-text API

Answer

A

You can use the speech-to-text API to perform real-time or batch transcription of audio into a text format. The audio source for transcription can be a real-time audio stream from a microphone or an audio file.

The model that is used by the speech-to-text API, is based on the Universal Language Model that was trained by Microsoft.

The model is optimized for two scenarios:
1. Conversational
2. Dictation.

Question 4

Q

Real-time transcription

Answer

A

Real-time speech-to-text allows you to transcribe text in audio streams. You can use real-time transcription for presentations, demos, or any other scenario where a person is speaking.

Question 5

Q

Batch transcription

Answer

A

Not all speech-to-text scenarios are real time. You may have audio recordings stored on a file share, a remote server, or even on Azure storage.

Batch transcription should be run in an asynchronous manner because the batch jobs are scheduled on a best-effort basis.

Question 6

Q

The text-to-speech API

Answer

A

The text-to-speech API enables you to convert text input to audible speech, which can either be played directly through a computer speaker or written to an audio file.

Question 7

Q

Speech synthesis voices

Answer

A

When you use the text-to-speech API, you can specify the voice to be used to vocalize the text. This capability offers you the flexibility to personalize your speech synthesis solution and give it a specific character.

Question 8

Q

(AI) solutions to accept vocal commands and provide spoken responses.
the AI system must support two capabilities:

Answer

A

Speech recognition - the ability to detect and interpret spoken input.
Speech synthesis - the ability to generate spoken output.

Question 9

Q

Speech recognition

Answer

A

Speech recognition is concerned with taking the spoken word and converting it into data that can be processed - often by transcribing it into a text representation.

The spoken words can be in the form of a recorded voice in an audio file, or live audio from a microphone.

Speech patterns are analyzed in the audio to determine recognizable patterns that are mapped to words. To accomplish this feat, the software typically uses multiple types of models, including:

An acoustic model that converts the audio signal into phonemes (representations of specific sounds).
A language model that maps phonemes to words, usually using a statistical algorithm that predicts the most probable sequence of words based on the phonemes.

Question 10

Q

In Speech recognition

The recognized words are typically converted to text, which you can use for various purposes, such as.

Answer

A

Providing closed captions for recorded or live videos
Creating a transcript of a phone call or meeting
Automated note dictation
Determining intended user input for further processing

Question 11

Q

Speech synthesis

Answer

A

Speech synthesis is in many respects the reverse of speech recognition.

It is concerned with vocalizing data, usually by converting text to speech. A speech synthesis solution typically requires the following information:

The text to be spoken.
The voice to be used to vocalize the speech.

To synthesize speech, the system typically tokenizes the text to break it down into individual words, and assigns phonetic sounds to each word.
These phonemes are then synthesized as audio by applying a voice, which will determine parameters such as pitch and timbre; and generating an audio wave form that can be output to a speaker or written to a file.

Question 12

Q

purposes of using the output of speech synthesis

Answer

A

Generating spoken responses to user input.
Creating voice menus for telephone systems.
Reading email or text messages aloud in hands-free scenarios.
Broadcasting announcements in public locations, such as railway stations or airports.

Question 13

Q

Speech cognitive service

Answer

A

Microsoft Azure offers both speech recognition and speech synthesis capabilities through the Speech cognitive service, which includes the following application programming interfaces (APIs):

The Speech-to-Text API
The Text-to-Speech API

Question 14

Q

Azure resources for the Speech service

Answer

A

A Speech resource - choose this resource type if you only plan to use the Speech service, or if you want to manage access and billing for the resource separately from other services.
A Cognitive Services resource - choose this resource type if you plan to use the Speech service in combination with other cognitive services, and you want to manage access and billing for these services together.

Question 15

Q

The speech-to-text API

Answer

A

You can use the speech-to-text API to perform real-time or batch transcription of audio into a text format. The audio source for transcription can be a real-time audio stream from a microphone or an audio file.

The model that is used by the speech-to-text API, is based on the Universal Language Model that was trained by Microsoft.

Question 16

Q

Real-time transcription

Answer

A

Real-time speech-to-text allows you to transcribe text in audio streams. You can use real-time transcription for presentations, demos, or any other scenario where a person is speaking.

In order for real-time transcription to work, your application will need to be listening for incoming audio from a microphone, or other audio input source such as an audio file. Your application code streams the audio to the service, which returns the transcribed text.

Question 17

Q

Batch transcription

Answer

A

Not all speech-to-text scenarios are real time. You may have audio recordings stored on a file share, a remote server, or even on Azure storage. You can point to audio files with a shared access signature (SAS) URI and asynchronously receive transcription results.

Batch transcription should be run in an asynchronous manner because the batch jobs are scheduled on a best-effort basis. Normally a job will start executing within minutes of the request but there is no estimate for when a job changes into the running state.

Question 18

Q

The text-to-speech API

Answer

A

The text-to-speech API enables you to convert text input to audible speech, which can either be played directly through a computer speaker or written to an audio file.

Question 19

Q

Speech synthesis voices

Answer

A

When you use the text-to-speech API, you can specify the voice to be used to vocalize the text. This capability offers you the flexibility to personalize your speech synthesis solution and give it a specific character.

Brainscape's Knowledge GenomeTM

Recognize and synthesize speech Flashcards

Brainscape's Knowledge Genome^TM