Chapter 11 Perceiving Speech and Music Flashcards
(43 cards)
Phonemes
smallest units of sound that, if changed, would change the meaning of a word; basic sounds of a language
- > “Cat”-> “Bat”/ “Cut”/ “Cap”
- The same letter in written words often corresponds to two or more different sounds (“icicle” the two letters “I” sound quite different and represent different phonemes), or letters might correspond to no sound at all (“h” and “e” in “rhyme”)
International Phonetic Alphabet (IPA)
an alphabet in which each symbol stands for a different speech sound; provides a distinctive way to write each phoneme in all the human languages currently in use.
-> Combinations of symbols and various types of accent marks
Producing Sounds of Speech
Most speech sounds begin with an exhalation of air from the lungs-> trachea (windpipe)-> larynx (voice box)-> pharynx-> oral and nasal cavities -> exit body via mouth and nose
Larynx (Voice box)
part of the vocal tract that contains the vocal folds
Vocal folds (vocal cords)
pair of membranes within the larynx
- > can be relaxed and open, allowing air to pass silently, or can be tensed, which causes them to vibrate when air passes.
- The fundamental frequency of vocal fold vibration depends on the size and thickness of the vocal folds and the size and shape of the larynx, as well as on the current degree of contraction or relaxation of muscles in the throat.
- Vibrating vocal folds produce sounds containing harmonic frequencies in addition to the fundamental frequency.
- > Adult male: fundamental frequency of 125 Hz contains higher harmonics of 250 Hz, 375 Hz, 500 Hz, and so on, with amplitudes that tend to decrease as their frequency increases.
Pharynx
Uppermost part of the throat
Uvula
flap of tissue that hangs off the posterior edge of the soft plate; it can bend upward to close off the nasal cavity, directing all exhaled air into the oral cavity and out the mouth (important in the production of many speech sounds)
Vibrations with fundamental frequencies
- Male: 85-180 Hz
- Females: 165-255 Hz
- Children: over 300 Hz
- Most vocalization involves changing the fundamental frequency of vocal fold vibrations.
- > Contracting or relaxing muscles in throat-> changes tension of vocal folds -> rate of vibration
- > The greater the tension, the faster the vibration, the higher the pitch
Vowels
speech sounds produced with a relatively unrestricted flow of air through the pharynx and oral cavity.
-> Different vowels produced by varying the size and shape of the oral cavity.
Consonants
speech sounds produced by restricting the flow of air at one place or another along the path of the airflow from the vocal folds.
Producing Vowels
In order to produce different vowel sounds, a speaker has to modify the basic sound produced by his or her vocal folds, by modifying the shape of the oral cavity in order to attenuate certain harmonics more than others, with a different pattern of modification for each different vowel.
*Oral cavity with different shapes have different resonances -> frequencies are attenuated by how much
Formants
frequency bands with relatively high amplitude in the harmonic spectrum of a vowel sound.
Sound spectrogram
a graph that includes the dimensions of frequency, amplitude, and time, showing how the frequencies corresponding to each vowel sound in an utterance change over time.
Producing Consonants
Place of articulation, Manner of articulation, Voicing
Place of articulation
point in the vocal tract at which airflow is restricted, described in terms of the anatomical structures involved in creating the restriction
Manner of articulation
the nature of the restriction of airflow in the vocal tract
Voicing
whether the vocal folds are vibrating or not (whether the consonant is voiced or voiceless)
Different acoustic events may all represent the same phoneme
- Different talkers produce sounds with different fundamental frequencies
- Dialects
- And even the same phonemes produced at different times by the same speaker can differ significantly
- Our auditory system cannot identify phonemes by simply mapping specific frequencies to specific phonemes.
- > Using the relative positions of frequencies in the context of the entire speech stream
- > Acoustic features
- > Identify the phonemes in speech
- > Listener’s knowledge of the language and understanding of the context
Coarticulation
the influence of one phoneme on the acoustic properties of another, due to the articulatory movements required to produce them in sequence
*affects not only flow across the transition from one phoneme to the next, but also flow “backward”, from an upcoming phoneme to the phoneme currently being produced.
Perceptual constancy
- Different sensory stimuli regularly resulting in identical perceptions
- Hear two different sounds as the same consonant
Categorical perception
perception of different sensory stimuli as identical, up to a point at which further variation in the stimulus leads to a sharp change in the perception.
- > Opposed to continuous perception, in which there are no sharp changes in perception as the stimulus varies.
- Research suggests that our perception of certain speech sounds is categorical rather than continuous, where the categories are different phonemes.
Voice onset time (VOT)
in the production of stop consonants, the interval between the initial burst of frequencies and the onset of voicing.
*Pairs of voiceless and voiced stop consonants are always differentiated by this pattern of a relatively long VOT for the voiceless stop and a relatively short VOT for the voiced stop.
Phonemic boundary
the voice onset time at which a stop consonant transitions from being mainly perceived as voiced to being mainly perceived as voiceless.
- > This transition at 30 msec VOT from mostly perceiving /ba/ to mostly perceiving /pa/
- > There are “detectors” in the auditory system tuned to respond to certain ranges of VOTs.
- > This would explain why VOTs in the range of 25-35 msec lead to uncertainty about the perception of /b/ or /p/ (/ba/ or /pa/)- both types of detectors are responding (with a similar level of response to VOTs of about 30 msec).
McGurk Effect
in the perception of speech sounds, when auditory and visual stimuli conflict, the auditory system tends to compromise on apperception that shares features with both the seen and the heard stimuli; if no good compromise perception is available, either the conflict is resolved in favor of the visual stimulus or there is a conflicting perceptual experience.