Lecture 9 - Multimodality in Speech Flashcards

1
Q

What is the SNR range in speech intellgibility?

A

6dB to -5dB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When observing the TCD-TIMIT set on lip reading, what features faired best and worst?

A
  1. Best; thinner lips, same colour around mouth
  2. Worst; facial hair, thicker lips
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The ‘LRS2’ dataset is from the BBC; why is it more challenging?

A
  1. Head pose
  2. Illumination changes
  3. Low image resolution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain why an approach of waiting for gaps in a conversation to indicate turns is not valid.

A
  1. Gaps in real-life are short and overlap occurs in turn-taking.
  2. The modal response is 200 ms and shown to take 600 ms to find term; we plan our term.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Outline 4 cues which are used by humans to regulate turns in a conversation, and indicate whether they are used differently for a hold or yield.

A
  1. Verbal;
    - yield; finishes sentence.
    - hold; filled pause
  2. Prosody/Pitch;
    - yield; rising or falling pitch
    - hold; flat pitch
  3. Breathing
    - yield; breathe out
    - hold; breathe in
  4. Gaze
    - yield; look at addressee
    - hold; looking away
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is back-channel?

A

Saying phrases such as ‘okay’ and ‘okay you go’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly