speech perception Flashcards
(22 cards)
Phoneme
The smallest unit of speech; changing a phoneme can change the meaning of a word
English has 37 phonemes
13 vowels and 24 consonants
The number of Phonemes vary between languages
The acoustic signal for speech
Speech sounds are produced by the air pushed up against the lungs through the vocal cords and into the vocal tract
Some parts are fixed (e.g nasal cavity, hard palate)
Other parts can move, such as the vocal cords and the articulators (the tongue, lips, jaw, and soft palate)
The acoustic signal for vowels
formants
Vowel sounds are produced by vibration of the vocal cords, changing the shape of the vocal tract.
Formants: the frequencies corresponding to the peaks in a vowel sounds pressure wave
The first formant is the lowest frequency the second is the next highest and so on…
simple explaination- Formants are the special peak frequencies that make each vowel sound unique—like a vocal ID card When you say “ah” vs. “ee,” your mouth shapes these peaks differently.
1st Formant (F1): Lowest frequency peak → Tied to mouth openness.
High F1 = “ah” (open mouth, like “father”).
Low F1 = “oo” (small mouth, like “food”).
2nd Formant (F2): Next peak → Tied to tongue position.
High F2 = “ee” (tongue forward, like “see”).
Low F2 = “aw” (tongue back, like “dog”).
Higher Formants (F3, F4…)
Fine-tune sounds further (e.g., distinguishing “er” vs. “uh”).
resonant frequency in vowels
Changes in the shape of the vocal tract produce different resonant frequencies; each vowel has a different resonant frequency signature
Sound spectrograms
show the changes in frequency and intensity over time for speech
The acoustic signal for consonants
Consonants are produced by constrictions of the vocal tract
making consonant sounds requires substantial stereotypical movements of the articulators, with each consonant having a signature movement
The vocal cords are largely silent for consonant sounds, they are used primarily for vowels
Formant transitions
rapid changes in frequency preceding or following consonant sounds
The segmentation problem
There are no physical breaks in the continuous acoustic speech signal.
The variablity problem
there is no simple correspondence between the acoustic signal and the individual phonemes
variability from a phoneme’s context
The acoustic signal is associated with phoneme changes depending on its context
coarticulation
The blending of neighboring sounds during speech
Variability from different speakers
Different people speak differently
fast/slow
high-pitched/low-pitched
accent/ no accent
clear/ sloppy
Categorical speech perception
The mapping of a range of acoustic signals onto the perception of a limited number of sound categories
Your brain lumps similar speech sounds into distinct categories
Listeners do not hear the incremental changes between sounds, instead they hear sudden changes at phonetic boundaries
we experience perceptual constancy for the phonemes within a given voice onset time
voice onset time
The delay between when a sound begins and when the vocal cords start to vibrate (aka Voicing)
Elimas and Corbit (voice onset time)
Used a computer to create a range of “da” and ‘ta” sounds with VOTs varying from short to long
at VOTs <35msec subjects perceive “da” at >40msec subjects precieve “ta”
Phonetic boundary- the VOT when perception transitions from “da” to “ta”
Speech perception is multimodal
Perception of speech is influenced by information from more than one sense
The McGurk effect
Auditory stimulus has a speaker saying “ba-ba”
Visual stimulus has a speaker saying “ga-ga”
observer watching and listening hears “da-da” which is the midpoint between “ga” and “ba”
observer with eyes closed will hear “ba”
The effect of meaning on speech perception
Top-down processing- including knowledge a listener has about a language, affects the perception of the incoming stimulus
Phoneme interpretation is affected by context and meaning
Turvey and Van Gelder- speech perception
presented short words (sin, bat, and leg) and short non words (jum, baf, and teg) to listeners
The task was to press a button as quickly as possible when they heard the target phoneme
on average, listeners were faster with words than non words
actual words were recognized, enabling top down factors to speed phoneme detection
Warren speech perception
presented listeners with a sentence that had a phoneme covered by a cough
The task was to state where in the sentence the cough occurred
Listeners could not correctly identify the position and even didn’t notice that a phoneme was missing
Phoneme restoration effect
A phoneme that is missing from the auditory signal gets restored by top-down knowledge
Experience-dependent plasticity Kuhl
Before age 1, human infants can tell the difference between all the sounds that create all languages
But the brain becomes tuned to respond best to speech sounds that are in the environment, and we lose the ability to differentiate sounds that we don’t hear during development
As adults, we are impaired in our ability to perceive and recognize phonemes not used in our native language
An example of categorical perception- as we become experts in recognizing our native language, we lose our ability to recognize others