Flashcards in Speech Science Exam 2 Deck (179):
When we work with sound files what are we working with?
What is the difference between digital and analog audio signals?
digital signals have gaps in them, whereas analogs signals are continuous
What are examples of analog signals? 3
speech sounds, musical tones, displacement of middle ear bones;
What is an analog signal?
a signal continuous in time and amplitude; analog signal exist every moment and their amplitude can take on a value
What is a digital signal?
discrete time signal (it only exists at discrete points, not continuously); not continuous; digital "samples"; digital sigs exist at given moments only, and there is nothing in between
Are digital sound files superior to analog tapes? 4 reasons why yes
1 quality (copies are are identical to the original)
2 digital signals are more resistant to noise and degradation when transmitted than analog signals; 3 storage sensitive; 4 flexible and efficient
Why does digitizing work?
Even with a discrete number of data points, the signal can be represented well (to the point
where the digital signal sounds like the original analog signal). For example, a CD sounds as
good as an analog master tape.
What is digitizing?
Get a representative sample. Since the analog signal is continuous and its digital representation is not, the data points need to be selected carefully so that the quality of the sample is not significantly compromised. The name “digitization” literally means converting to numbers (digits) so that the information can be stored in a numeric format.
What is sampling?
taking samples at given intervals (therefore, the energy btw the sampling points is discarded) (measured kHz); ex. a sampling rate of 10000 Hz (10kHz) means that the analog signal is sampled 10,000 times per second
What is quantization?
Converts the amplitude or energy level of the samples. The amplitude of the signal is made discrete. The continuous amplitude variations need to be represented.
A quantum is an increment of energy (measured in bits)
Before sampling and quantization, the signal may be passed through a _____ and a _______.
pre-emphasis filter; presampling (low-pass) filter
What is the importance of presampling (low pass) filter?
it rejects the energy above the highest freq of interest
What is the imp of pre-emphasis filter?
basically boosts higher frequencies
What is an acceptable sampling rate? Who came up with it?
you need to sample at least twice the highest frequency of interest (Nyquist's sampling theorem 1928) - human ear hears up to 20kHz, and so we use 44kHz
According to Nyquist's theorem, what is the number of samples needed?
at least double the highest frequency of interest; since the ear can hear 20kHz for music we need 44kHz
Why doe we need to have a sampling rate of at least twice the highest frequency of interest?
to avoid aliasing
What is an alias?
an assumed identity; a fake; skipping some data points, creates an alternate perception; if we don't have enough sample points, the sound will alias to something different
How can you avoid aliasing? 3
1 remember the highest freq of interest (22kHz)
2 filter the energy above the highest frequency of interested (presampling (low-pass) filter)
3. sample the signal at a rate that is at least 2x as high as the highest freq of interest
Who was an important theorist, who gave us the minimum sampling rate in 1928?
What is quantization?
conversion the amplitude or energy level of the samples, of the signal is made discrete, the continuous amplitude variations need to be represented
A ____ is an increment of energy.
A 1-bit system has ___ levels.
A 3-bit system has ___ levels.
What is another name for quantization rate?
What is quantization rate measured in?
bits (binary digits, 0 and 1)
Thus a 16-bit system (used for audio applications) has ___ levels.
What graphs are commonly used in speech analyses? 3
1. time waveform
What does the time waveform measure?
presents the amplitude of the signal as a function of time
What does frequency measure in time waveform?
the number of times an object (such as air molecules) vibrates thru a complete cycle per second (measured in Hertz (Hz)
what is the period?
the length of one cycle
How do we measure frequency?
1/period time in seconds
the shorter the period, the ___ the frequency.
you can measure frequency by measuring the distance between two primary ___ and applying the formula: frequency - 1/tie in seconds.
What is meant by time wave form?
amplitude as a function of time
What is spectrum? What is it good for?
the amplitude of the signal as a function of frequency (i.e. the amplitude on the Y axis and the frequency on the X axis); creates a display of the frequency composition of a signal at a point in time
Does spectrum have a temporal component?
What is the spectrum useful for?
looking at the magnitude at various frequency components at a specific time
What are two kinds of spectra?
1. FFT (fast Fourier Transform); 2 LPC (Lindear Predictive Coding)
How does a FFT work?
decomposes a signal into it freq components; an algorithm that greatly expedites the computations req for a more precise Discrete Fourier Transform; useful for looking at all (or most) frequency components; increasing the FFT points allows more accurate display
What is the FFT display?
amplitude is on the vertical axis and afrequency is on the horizontal axi; line represents freq components and their amplitudes,
How are frequency components noted on a FFT?
What is a LPC?
linear predictive coding: a method that attempts to predict upcoming speech samples based on a weighted sum of previous samples; uses estimation based on a vocal tract model filter (not as precise as FFT); not all components, just peaks
What is found in the LPC display? What is it useful to see?
amplitude is on the vertical axis and frequency is on the horizontal axis; line represents estimated spectral peaks and their amplitudes; helpful for seeing vowel formants; useful for looking at spectral peaks, but not detailed freq components (must be cautious when attempting to interpret LPC display)
By increasing the LPC order and FFT points ____ the display. How?
changes, will look for more peaks
What is a spectrogram?
literally a series of spectra over time with a time-freq-intensity display; related to the spectrum; sounds are analyzed in a 3d pattern of time (horizontal) freq (vertical) and amplitude (coded by dif colors or shades of gray); shows spectral peaks; the input spectrum is averaged by a filter and the formant (=resonance of the vocal tract) freq appear as darkened bars on the spectrogram; has a time domain and allows one to view changes over time
What are the two types of spectrograms?
narrow-band and wide-band spectrograms
How do you get from a spectra to a spectrogram?
1. "sample" waveform every several milliseconds
2. plot series of spectra over time with shades of gray or color (amplitude)
3. turn it sidewise and that is one slice
What are narrow-band spectrograms?
have detailed frequency resolution (i.e. sho frequencies more precisely than a wide-band spectrogram); good for looking at pitch changes, harmonic structure; but not as good for looking at resonances
How do we get the narrow-band spectrograms?
analysis bandwith (window) has to be narrower than the distance in the frequency between the harmonics of the voicing source; gen, one would pick an analysis bandwidth that is less than the speaker's F0 (i.e. if the speakers F0 is 100 Hz, the bandwidth should be less than that, or 50 Hz); window has to be less thant the F0
What are wide-band spectrograms?
span over a wider range of freq than narrow-band spectrogram; more than the F0; vertical striations indicate glottal pulses; good for loking at resonances; not good for looking at pitch changes or harmonic structure
What does analysis bandwith for wide-band spectrogram?
it has to be larger than the distance in the freq between the harmonics of the voiceing source; generally one would pick an analysis bandwith that is larger than the speaker's F0; (i.e. if the speakers' F0 is 100 Hz, the bandwith should be larger than that, at lest 150 Hz, but 200 Hz is referable)
What do formants represent?
resonances of the vocal tract
What do vertical striations indicate on a wide band spectrogram?
What is pitch contour?
easiest display to get; hover in one of the displays and it will display the pitch contour; don't click inside, it changes the values
______ is perceived as pitch of voice.
the fundamental frequency
The ______ is called the fundamental frequency.
the lowest frequency component
Each multiple integer of F0 is a ____
Each doubling of F0 is called ____
The amplitude of the harmonic spectrum _____ as it increases in frequency.
The amplitude of the harmonic spectrum decreases as frequency increases at a rate of about ____
Since harmonic multiple integers of f0 depend on f0, the higher the f0, the ______ the harmonics will be.
The ____ modifies the harmonic series.
_____, or between speakers, and _____, or within one speaker, contribute to F0 variations.
interspeaker and intraspeaker
A ____ provides lines for each formant, and will give you readings on them if place the cursor over.
An LPC can be cut to look at vowels or fricatives, by cutting it to ____ for vowels and ___ for fricatives.
0-4,000 Hz and 0-10,000 or 12,000 Hz
____ is caused by vocal fold vibration (voicing)
Different rates of vocal fold vibration results in ____
The average female F0 is about ____
The average male F0 is about ____
The average 5 y.o. female F0 is about ___ and the average 5 y.o. male F0 is about ___
252 Hz; 247 Hz
Infant's non-distress cry is ___, startle cry is ___, pain cry is ___, and hunger cry is ___.
317-342 Hz; 442 Hz; 442 Hz; 442 Hz
Alaryngeal F0 for males is ___ and for females is ___
65 Hz; 87 Hz
Alaryngeal voicing is voicing ___
not made at the larynx (often esophagus)
_____ is a vibratory response to an applied force.
_____ do not initiate sound energy, they are set into forced vibration.
A body of air (such as the vocal tract) may resonate in response to sound that has frequencies matching the _______ of the volume of air.
natural resonant frequencies
Resonance depends on (4 things)
1 open or closed ends of the tube
2 length of the tube
3 shape of the tube
4 size of the openings of the tube
A ____ is also a column of air that can be set into vibration (hint, we have on of these).
____ is a peak of resonance in the vocal tract.
Fn = (2n-1)c/4l means what
F is a resonance, n= integer, c = 34400 cm/s, and l = length of the tube in cm
What is the formula for resonance?
Fn = (2n-1)c/4l
_____ creates various peaks of resonance in
shaping of the vocal tract
A typical male vocal tract is _____ long. A typical female vocal tract is _____
17.5 cm; 14.5 cm
c in the formula for resonance is _____.
the speed of sound at sea level
What is a formant?
a peak of resonance in the vocal tract
Fn = (2n-1)c/4l is called
the quarter wavelength formula
What are the components of the source?
could be glottal source for vowels (typically voicing, but noise (hiss) excitation is also possible (such as whisper)
_____ diminish as they increase in frequency.
Only voiced sounds have harmonics. True or False
The vocal tract acts as a ________ ______.
Altering cavity sizes (changin the constrictions in the vocal tract) results in ___ ___ ____ producing a different vowel.
different resonant frequencies
___ ___ is filtered according to the frequency response of the vocal tract filter.
If the glottal source is voiced, you will have a ____ wave and the ____ of the glottal source at or near the spectral peaks of the transfer function of the vocal tract are resonated, while those distant from the spectral peaks lose energy and become attenuated.
The resonator works if the source is _____ (such as whisper) or if F0 changes.
The radiated sound of the voice leaving the mouth drops energy at a rate of _____. (it gets louder as frequency increases)
What is c?
the speed of sound at sea level: 34400cm/s
Source could be ___ or ___ (regarding voicing)
periodic; aperiodic (noise, voiceless) OR BOTH (affricates)
The filter (vocal tract) ____
shapes the sound
You can have different F0 with same filter but not different filters and same F0.
What graph do you look at for resonances?
The ___ works the same way if the source is periodic or aperiodic.
Same speaker with different F0, will have ___ (same/different) response characteristic (of the resonator).
Same speaker with different F0, will have ____ (same/different) harmonic spacing.
The glottal pulses when the F0 is higher, will be ___ (closer together, further apart).
The glottal pulses when the F0 is lower, will be ___ (closer together, further apart).
Use a wide band spectragram to look at ___
Use a narrow band spectragram to look at ____
What is the formula for calculating frequency?
1/ time of the period in seconds
___ are speech sounds produced w/ relatively open vocal tract, perceptually salient and want to be syllable nuclei.
____ are single unchanging vowels.
____ are changing articulation/vowels.
In normal American English, vowels are typically ___ (voiced/voiceless).
What are the 4 types of vowel features?
1 tongue height
2 tongue backness
3 lip rounding
How does tenseness work w/ vowels?
tense vowels are associated w/ more extreme tongue position than lax vowels, length (tense vowels are longer), can be in open or closed syllables
____ are concentrations of energy in the spectrum that correspond to the vocal tract resonance frequencies.
___ are basically resonances of the vocal tract.
Usually, ____ are sufficient to vowel recognition.
F1 & F2
The lowest frequency concentration is ___
Typically higher formants have ___ bandwidths.
___ indicate articulatory
Formant transitions indicate articulatory
____ is inversely related w/ tongue height (in general the higher the vowel, the lower the __)
___ is responsive to changes in mouth opening.
___ is related to tongue advancement (back to front, ___ goes up as tongue moves forward.)
____ is responsive to changes in size of the oral cavity; backing or lip rounding lower ___.
____ are generally lowered by lip rounding.
F1, F2, F3, and F4 are generallylowered by lip rounding.
____ and ____ of the cavity greatly determine what the vowel will be like.
Shape and size
What are the F1 and F2 for males for i?
270; 2290 Hz
What are the F1 and F2 for females for i?
310; 2790 Hz
What are the F1 and F2 for males for a?
730; 1090 Hz
What are the F1 and F2 for females for a?
850; 1220 Hz
What are the F1 and F2 for males for u?
300; 870 Hz
What are the F1 and F2 for females for u?
370; 950 Hz
The shape of the vocal tract will have implications as to ___
what the sound will be like
Computer reads the ___ of each formant, but the formant is the ___.
the center frequency; whole band
One of the ways to get formants is to do a ____
formant plot (don't touch inside the graph!!)
When you look at pitch change, what graph do you view?
pitch contour (only shows F0)
What are the potential cues for vowel identification? 2
1 static properties v. dynamic properties,
2 intrinsic properties v extrinsic properties
What are static properties of vowels?
such as steady-state formant frequencies and the fundamental phonetic environment) e.g. speaking rate
What are dynamic properties of vowels?
including inherent spectral change and consonantal context effects; relative vowel amplitude
What are intrinsic properties of vowels?
(intra-segmental) relational properties, especially
relations among the fundamental and formant frequencies
What are extrinsic properties of vowels?
(transsegmental) relational properties, such as
relative vowel duration and the relative formant frequencies of a vowel compared to those of other vowels of the same speaker
Typically (but not always), tense vowels are ___ than lax vowels in English.
What is the one exception to the rule that tense vowels are longer than lax vowels?
ae; relatively long and lax
We don't differentiate vowels in English by ___ alone.
Is duration phonemic relevant for differentiation of vowels?
no, it relevant but is non-phonemic
Generally, low vowels are (more/less) intense than high vowels.
Vowel intensity is a (primary/secondary) cue in English.
secondary, but it does matter
Typically lower vowels have a (lower/higher) F0 than mid and high vowels.
___, ___, & ___ are secondary cues in English vowels.
duration, intensity and F0/pitch
What are three early intrinsic factor theories?
1 absolute formant frequencies determine the vowel identity
2 ranges of formant frequencies determine vowel identity
3 ratios of formant frequencies determine vowel identity
What are three early extrinsic factor theories?
1 vowel identity is determined by "normalizing" by means of point (corner) vowels.
2 listeners "estimate" (infer) the speakers' vocal tracts and use that as normalizing info to perceive vowels.
3 vowel identity is aided by formant transitions of consonants.
What are issues with differentiating vowels by formants alone? 3
1 variability involving formant values and even formant ranges exist on man levels (individual, speaking rate)
2 there are some overlapping areas in the formant ranges
3 ignores other factors that can be relevant cues (such as formant transitions, etc.)
___ & __ came up with an important chart in 1952 demonstrating that vowels occur in specific places.
Peterson & Barney
What are the issues with differentiating vowels by formant ratios? 3
2 male, female, and child vocal tracts are not scale models of each other
3 ignores other factors that can be important relevant cues (such as formant transitions, etc.)
One theory uses ______ which means that there are intrinsic factors adding an element of speaker normalization.
Formant Radiots w/ F0 (speaker normalization is F0, where SR is "sensory reference")
The formant ratios theory involves what 3 formulas?
there are 3 dimensions; x axis= log(SF3/SF2)
y axis= log(SF1/SR) and z axis=log(SF2/SF1)
What does SR mean in the formant ratios theory?
"sensory reference" = 168 (GMF0/168)1/3 (grand mean F0)
What are problems with Formant Ratios w/ F0?
ignores other factors that can be important relevant cues (such as formant transitions, etc.)
What does the graph of Formant Ratios w/ F0 look like?
it is a tetrahedron (Miller's Tetrahedron; vowels vary along the three dimensions)
What are extrinsic theories reflect what?
listeners "estimate" (infer) the speakers' vocal tracts and use that as normalizing info to perceive vowels
What is an example of extrinsic factors at work in a speech science theory?
Vocal Tract Normalization
___ used an experiment where there was a precursor phrase preceding the stimuli; the stimuli were "bit" or "Bet"; in one condition the original samples were played, but in the second condition, F1 was shifted down (to stimulate a lger vocal tract) while the stimuli were unmodified; participiants were asked to ID the stimulus as "bit", "bet" or "bat." They found some effects.
Vocal Tract Normalization
Vocal tract normalization is a ____ theory.
Formant ratios w/ F0 is a ___theory.
Point vowel normalization is a ___ theory.
In ___, theorist said that listeners "calibrate" the vowel space of the speaker by listening to the "point" (corner) vowels, lke /i a u/; other vowels can be located by referring to point vowels It is a more specific version of vocal tract normalization. Based on the vowel space you form your reference.
point vowel normalization
___ is a more general version of point vowel normalization.
vocal tract normalization
What are the 4 intrinsic vowel theories?
1 Absolute Formant frequencies determine vowel ID
2 Ranges of Formant frequencies determine vowel ID (Peterson and Barney)
3 Ratios of Formant frequencies determine vowel ID (Peterson and Barney)
4 Formant Ratios adding an element of Speaker Normalization (Miller)
______ is when you have a group of speakers who's sample is clear. The better the conditions, the less errors.
What are the 3 intrinsic vowel theories?
1 Vocal Tract Normalization (Liberman and Gerstman)
2 Point Vowel Normalization (Lagefoged & Broadbendt)
3 Dynamic Specification Model (Strange)
Which cues are important in a signal? 4
1 static properties
2 dyanmic properties
3 intrinsic (intrasegmental)
4 extrinsic (transegmental)
What is the conclusion of Nearey's study?
"the main conclusion of this work is that alhtough the relative imp of some of these FX is situation dependent, none of these factors can be safely ignored in a full acct of English vowel perception. It seems fruitless for us to concentrate on only one set of FX and assume that the others are lab curiosities."
What are diphthongs?
gradual transitions from one vowel like articulation to another; highly variable
The harmonic spectrum is also known as ___
the spectrum of the glottal buzz or laryngeal output
Resonances of the vocal tract are called ____
What do you see in a wide-band spectrogram?
What do you see in a narrow-band spectrogram?
How do F0 changes affect the narrow-band spectogram?
the greater the F0 the larger the distance will be between the harmonics