Speech Science Exam 2 Flashcards

(179 cards)

1
Q

When we work with sound files what are we working with?

A

digital audio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between digital and analog audio signals?

A

digital signals have gaps in them, whereas analogs signals are continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are examples of analog signals? 3

A

speech sounds, musical tones, displacement of middle ear bones;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an analog signal?

A

a signal continuous in time and amplitude; analog signal exist every moment and their amplitude can take on a value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a digital signal?

A

discrete time signal (it only exists at discrete points, not continuously); not continuous; digital “samples”; digital sigs exist at given moments only, and there is nothing in between

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Are digital sound files superior to analog tapes? 4 reasons why yes

A
1 quality (copies are are identical to the original)
2 digital signals are more resistant to noise and degradation when transmitted than analog signals; 3 storage sensitive; 4 flexible and efficient
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why does digitizing work?

A

Even with a discrete number of data points, the signal can be represented well (to the point
where the digital signal sounds like the original analog signal). For example, a CD sounds as
good as an analog master tape.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is digitizing?

A

Get a representative sample. Since the analog signal is continuous and its digital representation is not, the data points need to be selected carefully so that the quality of the sample is not significantly compromised. The name “digitization” literally means converting to numbers (digits) so that the information can be stored in a numeric format.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is sampling?

A

taking samples at given intervals (therefore, the energy btw the sampling points is discarded) (measured kHz); ex. a sampling rate of 10000 Hz (10kHz) means that the analog signal is sampled 10,000 times per second

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is quantization?

A

Converts the amplitude or energy level of the samples. The amplitude of the signal is made discrete. The continuous amplitude variations need to be represented.
A quantum is an increment of energy (measured in bits)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Before sampling and quantization, the signal may be passed through a _____ and a _______.

A

pre-emphasis filter; presampling (low-pass) filter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the importance of presampling (low pass) filter?

A

it rejects the energy above the highest freq of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the imp of pre-emphasis filter?

A

basically boosts higher frequencies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an acceptable sampling rate? Who came up with it?

A

you need to sample at least twice the highest frequency of interest (Nyquist’s sampling theorem 1928) - human ear hears up to 20kHz, and so we use 44kHz

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

According to Nyquist’s theorem, what is the number of samples needed?

A

at least double the highest frequency of interest; since the ear can hear 20kHz for music we need 44kHz

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why doe we need to have a sampling rate of at least twice the highest frequency of interest?

A

to avoid aliasing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is an alias?

A

an assumed identity; a fake; skipping some data points, creates an alternate perception; if we don’t have enough sample points, the sound will alias to something different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How can you avoid aliasing? 3

A

1 remember the highest freq of interest (22kHz)
2 filter the energy above the highest frequency of interested (presampling (low-pass) filter)
3. sample the signal at a rate that is at least 2x as high as the highest freq of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Who was an important theorist, who gave us the minimum sampling rate in 1928?

A

Nyquist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is quantization?

A

conversion the amplitude or energy level of the samples, of the signal is made discrete, the continuous amplitude variations need to be represented

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

A ____ is an increment of energy.

A

quantum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

A 1-bit system has ___ levels.

A

2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

A 3-bit system has ___ levels.

A

8 (2^n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is another name for quantization rate?

A

resolution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is quantization rate measured in?
bits (binary digits, 0 and 1)
26
Thus a 16-bit system (used for audio applications) has ___ levels.
65,536
27
What graphs are commonly used in speech analyses? 3
1. time waveform | 2 spectrum/spectrogram/spectra
28
What does the time waveform measure?
presents the amplitude of the signal as a function of time
29
What does frequency measure in time waveform?
the number of times an object (such as air molecules) vibrates thru a complete cycle per second (measured in Hertz (Hz)
30
what is the period?
the length of one cycle
31
How do we measure frequency?
1/period time in seconds
32
the shorter the period, the ___ the frequency.
higher
33
you can measure frequency by measuring the distance between two primary ___ and applying the formula: frequency - 1/tie in seconds.
"spikes"
34
What is meant by time wave form?
amplitude as a function of time
35
What is spectrum? What is it good for?
the amplitude of the signal as a function of frequency (i.e. the amplitude on the Y axis and the frequency on the X axis); creates a display of the frequency composition of a signal at a point in time
36
Does spectrum have a temporal component?
no
37
What is the spectrum useful for?
looking at the magnitude at various frequency components at a specific time
38
What are two kinds of spectra?
1. FFT (fast Fourier Transform); 2 LPC (Lindear Predictive Coding)
39
How does a FFT work?
decomposes a signal into it freq components; an algorithm that greatly expedites the computations req for a more precise Discrete Fourier Transform; useful for looking at all (or most) frequency components; increasing the FFT points allows more accurate display
40
What is the FFT display?
amplitude is on the vertical axis and afrequency is on the horizontal axi; line represents freq components and their amplitudes,
41
How are frequency components noted on a FFT?
peaks
42
What is a LPC?
linear predictive coding: a method that attempts to predict upcoming speech samples based on a weighted sum of previous samples; uses estimation based on a vocal tract model filter (not as precise as FFT); not all components, just peaks
43
What is found in the LPC display? What is it useful to see?
amplitude is on the vertical axis and frequency is on the horizontal axis; line represents estimated spectral peaks and their amplitudes; helpful for seeing vowel formants; useful for looking at spectral peaks, but not detailed freq components (must be cautious when attempting to interpret LPC display)
44
By increasing the LPC order and FFT points ____ the display. How?
changes, will look for more peaks
45
What is a spectrogram?
literally a series of spectra over time with a time-freq-intensity display; related to the spectrum; sounds are analyzed in a 3d pattern of time (horizontal) freq (vertical) and amplitude (coded by dif colors or shades of gray); shows spectral peaks; the input spectrum is averaged by a filter and the formant (=resonance of the vocal tract) freq appear as darkened bars on the spectrogram; has a time domain and allows one to view changes over time
46
What are the two types of spectrograms?
narrow-band and wide-band spectrograms
47
How do you get from a spectra to a spectrogram?
1. "sample" waveform every several milliseconds 2. plot series of spectra over time with shades of gray or color (amplitude) 3. turn it sidewise and that is one slice
48
What are narrow-band spectrograms?
have detailed frequency resolution (i.e. sho frequencies more precisely than a wide-band spectrogram); good for looking at pitch changes, harmonic structure; but not as good for looking at resonances
49
How do we get the narrow-band spectrograms?
analysis bandwith (window) has to be narrower than the distance in the frequency between the harmonics of the voicing source; gen, one would pick an analysis bandwidth that is less than the speaker's F0 (i.e. if the speakers F0 is 100 Hz, the bandwidth should be less than that, or 50 Hz); window has to be less thant the F0
50
What are wide-band spectrograms?
span over a wider range of freq than narrow-band spectrogram; more than the F0; vertical striations indicate glottal pulses; good for loking at resonances; not good for looking at pitch changes or harmonic structure
51
What does analysis bandwith for wide-band spectrogram?
it has to be larger than the distance in the freq between the harmonics of the voiceing source; generally one would pick an analysis bandwith that is larger than the speaker's F0; (i.e. if the speakers' F0 is 100 Hz, the bandwith should be larger than that, at lest 150 Hz, but 200 Hz is referable)
52
What do formants represent?
resonances of the vocal tract
53
What do vertical striations indicate on a wide band spectrogram?
glottal pulses
54
What is pitch contour?
easiest display to get; hover in one of the displays and it will display the pitch contour; don't click inside, it changes the values
55
______ is perceived as pitch of voice.
the fundamental frequency
56
The ______ is called the fundamental frequency.
the lowest frequency component
57
Each multiple integer of F0 is a ____
harmonic
58
Each doubling of F0 is called ____
an octave
59
The amplitude of the harmonic spectrum _____ as it increases in frequency.
decreases
60
The amplitude of the harmonic spectrum decreases as frequency increases at a rate of about ____
12 dB/octave
61
Since harmonic multiple integers of f0 depend on f0, the higher the f0, the ______ the harmonics will be.
further apart
62
The ____ modifies the harmonic series.
vocal tract
63
_____, or between speakers, and _____, or within one speaker, contribute to F0 variations.
interspeaker and intraspeaker
64
A ____ provides lines for each formant, and will give you readings on them if place the cursor over.
formant plot
65
An LPC can be cut to look at vowels or fricatives, by cutting it to ____ for vowels and ___ for fricatives.
0-4,000 Hz and 0-10,000 or 12,000 Hz
66
____ is caused by vocal fold vibration (voicing)
glottal buzz
67
Different rates of vocal fold vibration results in ____
F0 changes
68
The average female F0 is about ____
211 Hz
69
The average male F0 is about ____
120 Hz
70
The average 5 y.o. female F0 is about ___ and the average 5 y.o. male F0 is about ___
252 Hz; 247 Hz
71
Infant's non-distress cry is ___, startle cry is ___, pain cry is ___, and hunger cry is ___.
317-342 Hz; 442 Hz; 442 Hz; 442 Hz
72
Alaryngeal F0 for males is ___ and for females is ___
65 Hz; 87 Hz
73
Alaryngeal voicing is voicing ___
not made at the larynx (often esophagus)
74
_____ is a vibratory response to an applied force.
resonance
75
_____ do not initiate sound energy, they are set into forced vibration.
resonators
76
A body of air (such as the vocal tract) may resonate in response to sound that has frequencies matching the _______ of the volume of air.
natural resonant frequencies
77
Resonance depends on (4 things)
1 open or closed ends of the tube 2 length of the tube 3 shape of the tube 4 size of the openings of the tube
78
A ____ is also a column of air that can be set into vibration (hint, we have on of these).
vocal tract
79
____ is a peak of resonance in the vocal tract.
formant
80
Fn = (2n-1)c/4l means what
F is a resonance, n= integer, c = 34400 cm/s, and l = length of the tube in cm
81
What is the formula for resonance?
Fn = (2n-1)c/4l
82
_____ creates various peaks of resonance in
shaping of the vocal tract
83
A typical male vocal tract is _____ long. A typical female vocal tract is _____
17.5 cm; 14.5 cm
84
c in the formula for resonance is _____.
the speed of sound at sea level
85
What is a formant?
a peak of resonance in the vocal tract
86
Fn = (2n-1)c/4l is called
the quarter wavelength formula
87
What are the components of the source?
could be glottal source for vowels (typically voicing, but noise (hiss) excitation is also possible (such as whisper)
88
_____ diminish as they increase in frequency.
harmonics
89
Only voiced sounds have harmonics. True or False
true
90
The vocal tract acts as a ________ ______.
variable resonator
91
Altering cavity sizes (changin the constrictions in the vocal tract) results in ___ ___ ____ producing a different vowel.
different resonant frequencies
92
___ ___ is filtered according to the frequency response of the vocal tract filter.
glottal source
93
If the glottal source is voiced, you will have a ____ wave and the ____ of the glottal source at or near the spectral peaks of the transfer function of the vocal tract are resonated, while those distant from the spectral peaks lose energy and become attenuated.
periodic; harmonics
94
The resonator works if the source is _____ (such as whisper) or if F0 changes.
aperiodic
95
The radiated sound of the voice leaving the mouth drops energy at a rate of _____. (it gets louder as frequency increases)
6 dB/octave
96
What is c?
the speed of sound at sea level: 34400cm/s
97
Source could be ___ or ___ (regarding voicing)
periodic; aperiodic (noise, voiceless) OR BOTH (affricates)
98
The filter (vocal tract) ____
shapes the sound
99
You can have different F0 with same filter but not different filters and same F0.
false
100
What graph do you look at for resonances?
LPC
101
The ___ works the same way if the source is periodic or aperiodic.
resonator
102
Same speaker with different F0, will have ___ (same/different) response characteristic (of the resonator).
same
103
Same speaker with different F0, will have ____ (same/different) harmonic spacing.
different
104
The glottal pulses when the F0 is higher, will be ___ (closer together, further apart).
closer together
105
The glottal pulses when the F0 is lower, will be ___ (closer together, further apart).
further apart
106
Use a wide band spectragram to look at ___
formants
107
Use a narrow band spectragram to look at ____
harmonics
108
What is the formula for calculating frequency?
1/ time of the period in seconds
109
___ are speech sounds produced w/ relatively open vocal tract, perceptually salient and want to be syllable nuclei.
vowels
110
____ are single unchanging vowels.
monophthongs
111
____ are changing articulation/vowels.
diphthongs
112
In normal American English, vowels are typically ___ (voiced/voiceless).
voiced
113
What are the 4 types of vowel features?
1 tongue height 2 tongue backness 3 lip rounding 4 tenseness
114
How does tenseness work w/ vowels?
tense vowels are associated w/ more extreme tongue position than lax vowels, length (tense vowels are longer), can be in open or closed syllables
115
____ are concentrations of energy in the spectrum that correspond to the vocal tract resonance frequencies.
formants
116
___ are basically resonances of the vocal tract.
formants
117
Usually, ____ are sufficient to vowel recognition.
F1 & F2
118
The lowest frequency concentration is ___
F1
119
Typically higher formants have ___ bandwidths.
larger
120
___ indicate articulatory | changes
Formant transitions indicate articulatory | changes
121
____ is inversely related w/ tongue height (in general the higher the vowel, the lower the __)
F1
122
___ is responsive to changes in mouth opening.
F1
123
___ is related to tongue advancement (back to front, ___ goes up as tongue moves forward.)
F2
124
____ is responsive to changes in size of the oral cavity; backing or lip rounding lower ___.
F2
125
____ are generally lowered by lip rounding.
F1, F2, F3, and F4 are generallylowered by lip rounding.
126
____ and ____ of the cavity greatly determine what the vowel will be like.
Shape and size
127
What are the F1 and F2 for males for i?
270; 2290 Hz
128
What are the F1 and F2 for females for i?
310; 2790 Hz
129
What are the F1 and F2 for males for a?
730; 1090 Hz
130
What are the F1 and F2 for females for a?
850; 1220 Hz
131
What are the F1 and F2 for males for u?
300; 870 Hz
132
What are the F1 and F2 for females for u?
370; 950 Hz
133
The shape of the vocal tract will have implications as to ___
what the sound will be like
134
Computer reads the ___ of each formant, but the formant is the ___.
the center frequency; whole band
135
One of the ways to get formants is to do a ____
formant plot (don't touch inside the graph!!)
136
When you look at pitch change, what graph do you view?
pitch contour (only shows F0)
137
What are the potential cues for vowel identification? 2
1 static properties v. dynamic properties, | 2 intrinsic properties v extrinsic properties
138
What are static properties of vowels?
such as steady-state formant frequencies and the fundamental phonetic environment) e.g. speaking rate
139
What are dynamic properties of vowels?
including inherent spectral change and consonantal context effects; relative vowel amplitude
140
What are intrinsic properties of vowels?
(intra-segmental) relational properties, especially relations among the fundamental and formant frequencies within vowels
141
What are extrinsic properties of vowels?
(transsegmental) relational properties, such as relative vowel duration and the relative formant frequencies of a vowel compared to those of other vowels of the same speaker
142
Typically (but not always), tense vowels are ___ than lax vowels in English.
longer
143
What is the one exception to the rule that tense vowels are longer than lax vowels?
ae; relatively long and lax
144
We don't differentiate vowels in English by ___ alone.
duration
145
Is duration phonemic relevant for differentiation of vowels?
no, it relevant but is non-phonemic
146
Generally, low vowels are (more/less) intense than high vowels.
more
147
Vowel intensity is a (primary/secondary) cue in English.
secondary, but it does matter
148
Typically lower vowels have a (lower/higher) F0 than mid and high vowels.
lower
149
___, ___, & ___ are secondary cues in English vowels.
duration, intensity and F0/pitch
150
What are three early intrinsic factor theories?
1 absolute formant frequencies determine the vowel identity 2 ranges of formant frequencies determine vowel identity 3 ratios of formant frequencies determine vowel identity
151
What are three early extrinsic factor theories?
1 vowel identity is determined by "normalizing" by means of point (corner) vowels. 2 listeners "estimate" (infer) the speakers' vocal tracts and use that as normalizing info to perceive vowels. 3 vowel identity is aided by formant transitions of consonants.
152
What are issues with differentiating vowels by formants alone? 3
1 variability involving formant values and even formant ranges exist on man levels (individual, speaking rate) 2 there are some overlapping areas in the formant ranges 3 ignores other factors that can be relevant cues (such as formant transitions, etc.)
153
___ & __ came up with an important chart in 1952 demonstrating that vowels occur in specific places.
Peterson & Barney
154
What are the issues with differentiating vowels by formant ratios? 3
1 variability 2 male, female, and child vocal tracts are not scale models of each other 3 ignores other factors that can be important relevant cues (such as formant transitions, etc.)
155
One theory uses ______ which means that there are intrinsic factors adding an element of speaker normalization.
Formant Radiots w/ F0 (speaker normalization is F0, where SR is "sensory reference")
156
The formant ratios theory involves what 3 formulas?
there are 3 dimensions; x axis= log(SF3/SF2) | y axis= log(SF1/SR) and z axis=log(SF2/SF1)
157
What does SR mean in the formant ratios theory?
"sensory reference" = 168 (GMF0/168)1/3 (grand mean F0)
158
What are problems with Formant Ratios w/ F0?
variable | ignores other factors that can be important relevant cues (such as formant transitions, etc.)
159
What does the graph of Formant Ratios w/ F0 look like?
it is a tetrahedron (Miller's Tetrahedron; vowels vary along the three dimensions)
160
What are extrinsic theories reflect what?
listeners "estimate" (infer) the speakers' vocal tracts and use that as normalizing info to perceive vowels
161
What is an example of extrinsic factors at work in a speech science theory?
Vocal Tract Normalization
162
___ used an experiment where there was a precursor phrase preceding the stimuli; the stimuli were "bit" or "Bet"; in one condition the original samples were played, but in the second condition, F1 was shifted down (to stimulate a lger vocal tract) while the stimuli were unmodified; participiants were asked to ID the stimulus as "bit", "bet" or "bat." They found some effects.
Vocal Tract Normalization
163
Vocal tract normalization is a ____ theory.
extrinsic
164
Formant ratios w/ F0 is a ___theory.
intrinsic
165
Point vowel normalization is a ___ theory.
extrinsic factors
166
In ___, theorist said that listeners "calibrate" the vowel space of the speaker by listening to the "point" (corner) vowels, lke /i a u/; other vowels can be located by referring to point vowels It is a more specific version of vocal tract normalization. Based on the vowel space you form your reference.
point vowel normalization
167
___ is a more general version of point vowel normalization.
vocal tract normalization
168
What are the 4 intrinsic vowel theories?
1 Absolute Formant frequencies determine vowel ID 2 Ranges of Formant frequencies determine vowel ID (Peterson and Barney) 3 Ratios of Formant frequencies determine vowel ID (Peterson and Barney) 4 Formant Ratios adding an element of Speaker Normalization (Miller)
169
______ is when you have a group of speakers who's sample is clear. The better the conditions, the less errors.
Talkers blocked
170
What are the 3 intrinsic vowel theories?
1 Vocal Tract Normalization (Liberman and Gerstman) 2 Point Vowel Normalization (Lagefoged & Broadbendt) 3 Dynamic Specification Model (Strange)
171
Which cues are important in a signal? 4
``` 1 static properties 2 dyanmic properties 3 intrinsic (intrasegmental) 4 extrinsic (transegmental) ```
172
What is the conclusion of Nearey's study?
"the main conclusion of this work is that alhtough the relative imp of some of these FX is situation dependent, none of these factors can be safely ignored in a full acct of English vowel perception. It seems fruitless for us to concentrate on only one set of FX and assume that the others are lab curiosities."
173
What are diphthongs?
gradual transitions from one vowel like articulation to another; highly variable
174
The harmonic spectrum is also known as ___
the spectrum of the glottal buzz or laryngeal output
175
Resonances of the vocal tract are called ____
formants
176
What do you see in a wide-band spectrogram?
formants
177
What do you see in a narrow-band spectrogram?
harmonics
178
How do F0 changes affect the narrow-band spectogram?
the greater the F0 the larger the distance will be between the harmonics
179
What is GMF0 mean?
grand mean F0