Week 11 Flashcards
(9 cards)
How does the ear perceive sound?
Sound is a series of variations in air pressure (compressions and rarefactions). Variations in air pressure enter through the outer ear and move the eardrum. These vibrating motions are transmitted via a bone chain in the middle ear. This moves hair cells in the inner ear, which send electrical impulses to the brain via the auditory nerves.
What is the biological Fourier Transform?
The ear converts sound waves to a spectral representation. Different regions of the basilar membrane react to different frequencies. This results in something like the frequency-domain representation of a sound you get after applying a Fourier-transform to it.
What are the scales for measuring frequency?
Hertz (Hz) measures the number of cycles per second. The physical measure is frequency. Linear - we think twice as many cycles, twice the Hz. Semitones (st) are a relative measure, still physical frequency. Logarithmic - twice as many cycles, up an octave. A semitone if 1/12 of an octave so twice as many cycles is up 12 st.
Is our perception of frequency logarithmic or non-logarithmic?
Our perception is somewhat logarithmic as we perceive relative differences rather than absolute ones. But logarithmic doesn’t completely capture our perception.
What is our perception of amplitude?
Amplitude in comparison to frequency doesn’t map well as f0 affects perception of loudness. Our perception is most sensitive in mid-range (500Hz to 5000Hz). The perceptual scales for amplitude are phons and sones.
What is our perception of duration?
Complex, depends on context (speaking rate, neighbouring segments) and segment types (plosive, vowel, fricative). No standard perceptual scale of time. Categories of length common across many languages are short and long.
What are cues for perceiving speech?
Speech perception aims to discover which aspects of the speech sound pattern are the essential ones. The essential stimulus patterns in a perceived event are called cues. Speech cues are the necessary acoustic patterns of speech that are sufficient to cause a person to correctly perceive a given sentence, word, phrase, or phoneme.
What are the two effects of context?
Coarticulation - positions/actions of articulators in neighbouring segments can affect how a segment is produced. Vowel undershoot - coarticulation leads to tongue missing ideal position for a given vowel.
What is speaker normalization?
There are theories about how listeners identify vowels spoken by different speakers. These theories fall into one of two categories: intrinsic models - assume there is sufficient information within the acoustic pattern of the vowel itself to allow identification. Extrinsic models - identification of vowels is based on a “frame of reference” that is established from preceding (and succeeding) speech patterns (suggests listeners use information about size, gender, and age to identify the vowel).