Lecture 9 - Multimodality in Speech Flashcards
1
Q
What is the SNR range in speech intellgibility?
A
6dB to -5dB
2
Q
When observing the TCD-TIMIT set on lip reading, what features faired best and worst?
A
- Best; thinner lips, same colour around mouth
- Worst; facial hair, thicker lips
3
Q
The ‘LRS2’ dataset is from the BBC; why is it more challenging?
A
- Head pose
- Illumination changes
- Low image resolution
4
Q
Explain why an approach of waiting for gaps in a conversation to indicate turns is not valid.
A
- Gaps in real-life are short and overlap occurs in turn-taking.
- The modal response is 200 ms and shown to take 600 ms to find term; we plan our term.
5
Q
Outline 4 cues which are used by humans to regulate turns in a conversation, and indicate whether they are used differently for a hold or yield.
A
- Verbal;
- yield; finishes sentence.
- hold; filled pause - Prosody/Pitch;
- yield; rising or falling pitch
- hold; flat pitch - Breathing
- yield; breathe out
- hold; breathe in - Gaze
- yield; look at addressee
- hold; looking away
6
Q
What is back-channel?
A
Saying phrases such as ‘okay’ and ‘okay you go’
7
Q
A