Speech Science Exam 2 Flashcards Preview

Uncategorized > Speech Science Exam 2 > Flashcards

Flashcards in Speech Science Exam 2 Deck (179):
1

When we work with sound files what are we working with?

digital audio

2

What is the difference between digital and analog audio signals?

digital signals have gaps in them, whereas analogs signals are continuous

3

What are examples of analog signals? 3

speech sounds, musical tones, displacement of middle ear bones;

4

What is an analog signal?

a signal continuous in time and amplitude; analog signal exist every moment and their amplitude can take on a value

5

What is a digital signal?

discrete time signal (it only exists at discrete points, not continuously); not continuous; digital "samples"; digital sigs exist at given moments only, and there is nothing in between

6

Are digital sound files superior to analog tapes? 4 reasons why yes

1 quality (copies are are identical to the original)
2 digital signals are more resistant to noise and degradation when transmitted than analog signals; 3 storage sensitive; 4 flexible and efficient

7

Why does digitizing work?

Even with a discrete number of data points, the signal can be represented well (to the point
where the digital signal sounds like the original analog signal). For example, a CD sounds as
good as an analog master tape.

8

What is digitizing?

Get a representative sample. Since the analog signal is continuous and its digital representation is not, the data points need to be selected carefully so that the quality of the sample is not significantly compromised. The name “digitization” literally means converting to numbers (digits) so that the information can be stored in a numeric format.

9

What is sampling?

taking samples at given intervals (therefore, the energy btw the sampling points is discarded) (measured kHz); ex. a sampling rate of 10000 Hz (10kHz) means that the analog signal is sampled 10,000 times per second

10

What is quantization?

Converts the amplitude or energy level of the samples. The amplitude of the signal is made discrete. The continuous amplitude variations need to be represented.
A quantum is an increment of energy (measured in bits)

11

Before sampling and quantization, the signal may be passed through a _____ and a _______.

pre-emphasis filter; presampling (low-pass) filter

12

What is the importance of presampling (low pass) filter?

it rejects the energy above the highest freq of interest

13

What is the imp of pre-emphasis filter?

basically boosts higher frequencies

14

What is an acceptable sampling rate? Who came up with it?

you need to sample at least twice the highest frequency of interest (Nyquist's sampling theorem 1928) - human ear hears up to 20kHz, and so we use 44kHz

15

According to Nyquist's theorem, what is the number of samples needed?

at least double the highest frequency of interest; since the ear can hear 20kHz for music we need 44kHz

16

Why doe we need to have a sampling rate of at least twice the highest frequency of interest?

to avoid aliasing

17

What is an alias?

an assumed identity; a fake; skipping some data points, creates an alternate perception; if we don't have enough sample points, the sound will alias to something different

18

How can you avoid aliasing? 3

1 remember the highest freq of interest (22kHz)
2 filter the energy above the highest frequency of interested (presampling (low-pass) filter)
3. sample the signal at a rate that is at least 2x as high as the highest freq of interest

19

Who was an important theorist, who gave us the minimum sampling rate in 1928?

Nyquist

20

What is quantization?

conversion the amplitude or energy level of the samples, of the signal is made discrete, the continuous amplitude variations need to be represented

21

A ____ is an increment of energy.

quantum

22

A 1-bit system has ___ levels.

2

23

A 3-bit system has ___ levels.

8 (2^n)

24

What is another name for quantization rate?

resolution

25

What is quantization rate measured in?

bits (binary digits, 0 and 1)

26

Thus a 16-bit system (used for audio applications) has ___ levels.

65,536

27

What graphs are commonly used in speech analyses? 3

1. time waveform
2 spectrum/spectrogram/spectra

28

What does the time waveform measure?

presents the amplitude of the signal as a function of time

29

What does frequency measure in time waveform?

the number of times an object (such as air molecules) vibrates thru a complete cycle per second (measured in Hertz (Hz)

30

what is the period?

the length of one cycle

31

How do we measure frequency?

1/period time in seconds

32

the shorter the period, the ___ the frequency.

higher

33

you can measure frequency by measuring the distance between two primary ___ and applying the formula: frequency - 1/tie in seconds.

"spikes"

34

What is meant by time wave form?

amplitude as a function of time

35

What is spectrum? What is it good for?

the amplitude of the signal as a function of frequency (i.e. the amplitude on the Y axis and the frequency on the X axis); creates a display of the frequency composition of a signal at a point in time

36

Does spectrum have a temporal component?

no

37

What is the spectrum useful for?

looking at the magnitude at various frequency components at a specific time

38

What are two kinds of spectra?

1. FFT (fast Fourier Transform); 2 LPC (Lindear Predictive Coding)

39

How does a FFT work?

decomposes a signal into it freq components; an algorithm that greatly expedites the computations req for a more precise Discrete Fourier Transform; useful for looking at all (or most) frequency components; increasing the FFT points allows more accurate display

40

What is the FFT display?

amplitude is on the vertical axis and afrequency is on the horizontal axi; line represents freq components and their amplitudes,

41

How are frequency components noted on a FFT?

peaks

42

What is a LPC?

linear predictive coding: a method that attempts to predict upcoming speech samples based on a weighted sum of previous samples; uses estimation based on a vocal tract model filter (not as precise as FFT); not all components, just peaks

43

What is found in the LPC display? What is it useful to see?

amplitude is on the vertical axis and frequency is on the horizontal axis; line represents estimated spectral peaks and their amplitudes; helpful for seeing vowel formants; useful for looking at spectral peaks, but not detailed freq components (must be cautious when attempting to interpret LPC display)

44

By increasing the LPC order and FFT points ____ the display. How?

changes, will look for more peaks

45

What is a spectrogram?

literally a series of spectra over time with a time-freq-intensity display; related to the spectrum; sounds are analyzed in a 3d pattern of time (horizontal) freq (vertical) and amplitude (coded by dif colors or shades of gray); shows spectral peaks; the input spectrum is averaged by a filter and the formant (=resonance of the vocal tract) freq appear as darkened bars on the spectrogram; has a time domain and allows one to view changes over time

46

What are the two types of spectrograms?

narrow-band and wide-band spectrograms

47

How do you get from a spectra to a spectrogram?

1. "sample" waveform every several milliseconds
2. plot series of spectra over time with shades of gray or color (amplitude)
3. turn it sidewise and that is one slice

48

What are narrow-band spectrograms?

have detailed frequency resolution (i.e. sho frequencies more precisely than a wide-band spectrogram); good for looking at pitch changes, harmonic structure; but not as good for looking at resonances

49

How do we get the narrow-band spectrograms?

analysis bandwith (window) has to be narrower than the distance in the frequency between the harmonics of the voicing source; gen, one would pick an analysis bandwidth that is less than the speaker's F0 (i.e. if the speakers F0 is 100 Hz, the bandwidth should be less than that, or 50 Hz); window has to be less thant the F0

50

What are wide-band spectrograms?

span over a wider range of freq than narrow-band spectrogram; more than the F0; vertical striations indicate glottal pulses; good for loking at resonances; not good for looking at pitch changes or harmonic structure

51

What does analysis bandwith for wide-band spectrogram?

it has to be larger than the distance in the freq between the harmonics of the voiceing source; generally one would pick an analysis bandwith that is larger than the speaker's F0; (i.e. if the speakers' F0 is 100 Hz, the bandwith should be larger than that, at lest 150 Hz, but 200 Hz is referable)

52

What do formants represent?

resonances of the vocal tract

53

What do vertical striations indicate on a wide band spectrogram?

glottal pulses

54

What is pitch contour?

easiest display to get; hover in one of the displays and it will display the pitch contour; don't click inside, it changes the values

55

______ is perceived as pitch of voice.

the fundamental frequency

56

The ______ is called the fundamental frequency.

the lowest frequency component

57

Each multiple integer of F0 is a ____

harmonic

58

Each doubling of F0 is called ____

an octave

59

The amplitude of the harmonic spectrum _____ as it increases in frequency.

decreases

60

The amplitude of the harmonic spectrum decreases as frequency increases at a rate of about ____

12 dB/octave

61

Since harmonic multiple integers of f0 depend on f0, the higher the f0, the ______ the harmonics will be.

further apart

62

The ____ modifies the harmonic series.

vocal tract

63

_____, or between speakers, and _____, or within one speaker, contribute to F0 variations.

interspeaker and intraspeaker

64

A ____ provides lines for each formant, and will give you readings on them if place the cursor over.

formant plot

65

An LPC can be cut to look at vowels or fricatives, by cutting it to ____ for vowels and ___ for fricatives.

0-4,000 Hz and 0-10,000 or 12,000 Hz

66

____ is caused by vocal fold vibration (voicing)

glottal buzz

67

Different rates of vocal fold vibration results in ____

F0 changes

68

The average female F0 is about ____

211 Hz

69

The average male F0 is about ____

120 Hz

70

The average 5 y.o. female F0 is about ___ and the average 5 y.o. male F0 is about ___

252 Hz; 247 Hz

71

Infant's non-distress cry is ___, startle cry is ___, pain cry is ___, and hunger cry is ___.

317-342 Hz; 442 Hz; 442 Hz; 442 Hz

72

Alaryngeal F0 for males is ___ and for females is ___

65 Hz; 87 Hz

73

Alaryngeal voicing is voicing ___

not made at the larynx (often esophagus)

74

_____ is a vibratory response to an applied force.

resonance

75

_____ do not initiate sound energy, they are set into forced vibration.

resonators

76

A body of air (such as the vocal tract) may resonate in response to sound that has frequencies matching the _______ of the volume of air.

natural resonant frequencies

77

Resonance depends on (4 things)

1 open or closed ends of the tube
2 length of the tube
3 shape of the tube
4 size of the openings of the tube

78

A ____ is also a column of air that can be set into vibration (hint, we have on of these).

vocal tract

79

____ is a peak of resonance in the vocal tract.

formant

80

Fn = (2n-1)c/4l means what

F is a resonance, n= integer, c = 34400 cm/s, and l = length of the tube in cm

81

What is the formula for resonance?

Fn = (2n-1)c/4l

82

_____ creates various peaks of resonance in

shaping of the vocal tract

83

A typical male vocal tract is _____ long. A typical female vocal tract is _____

17.5 cm; 14.5 cm

84

c in the formula for resonance is _____.

the speed of sound at sea level

85

What is a formant?

a peak of resonance in the vocal tract

86

Fn = (2n-1)c/4l is called

the quarter wavelength formula

87

What are the components of the source?

could be glottal source for vowels (typically voicing, but noise (hiss) excitation is also possible (such as whisper)

88

_____ diminish as they increase in frequency.

harmonics

89

Only voiced sounds have harmonics. True or False

true

90

The vocal tract acts as a ________ ______.

variable resonator

91

Altering cavity sizes (changin the constrictions in the vocal tract) results in ___ ___ ____ producing a different vowel.

different resonant frequencies

92

___ ___ is filtered according to the frequency response of the vocal tract filter.

glottal source

93

If the glottal source is voiced, you will have a ____ wave and the ____ of the glottal source at or near the spectral peaks of the transfer function of the vocal tract are resonated, while those distant from the spectral peaks lose energy and become attenuated.

periodic; harmonics

94

The resonator works if the source is _____ (such as whisper) or if F0 changes.

aperiodic

95

The radiated sound of the voice leaving the mouth drops energy at a rate of _____. (it gets louder as frequency increases)

6 dB/octave

96

What is c?

the speed of sound at sea level: 34400cm/s

97

Source could be ___ or ___ (regarding voicing)

periodic; aperiodic (noise, voiceless) OR BOTH (affricates)

98

The filter (vocal tract) ____

shapes the sound

99

You can have different F0 with same filter but not different filters and same F0.

false

100

What graph do you look at for resonances?

LPC

101

The ___ works the same way if the source is periodic or aperiodic.

resonator

102

Same speaker with different F0, will have ___ (same/different) response characteristic (of the resonator).

same

103

Same speaker with different F0, will have ____ (same/different) harmonic spacing.

different

104

The glottal pulses when the F0 is higher, will be ___ (closer together, further apart).

closer together

105

The glottal pulses when the F0 is lower, will be ___ (closer together, further apart).

further apart

106

Use a wide band spectragram to look at ___

formants

107

Use a narrow band spectragram to look at ____

harmonics

108

What is the formula for calculating frequency?

1/ time of the period in seconds

109

___ are speech sounds produced w/ relatively open vocal tract, perceptually salient and want to be syllable nuclei.

vowels

110

____ are single unchanging vowels.

monophthongs

111

____ are changing articulation/vowels.

diphthongs

112

In normal American English, vowels are typically ___ (voiced/voiceless).

voiced

113

What are the 4 types of vowel features?

1 tongue height
2 tongue backness
3 lip rounding
4 tenseness

114

How does tenseness work w/ vowels?

tense vowels are associated w/ more extreme tongue position than lax vowels, length (tense vowels are longer), can be in open or closed syllables

115

____ are concentrations of energy in the spectrum that correspond to the vocal tract resonance frequencies.

formants

116

___ are basically resonances of the vocal tract.

formants

117

Usually, ____ are sufficient to vowel recognition.

F1 & F2

118

The lowest frequency concentration is ___

F1

119

Typically higher formants have ___ bandwidths.

larger

120

___ indicate articulatory
changes

Formant transitions indicate articulatory
changes

121

____ is inversely related w/ tongue height (in general the higher the vowel, the lower the __)

F1

122

___ is responsive to changes in mouth opening.

F1

123

___ is related to tongue advancement (back to front, ___ goes up as tongue moves forward.)

F2

124

____ is responsive to changes in size of the oral cavity; backing or lip rounding lower ___.

F2

125

____ are generally lowered by lip rounding.

F1, F2, F3, and F4 are generallylowered by lip rounding.

126

____ and ____ of the cavity greatly determine what the vowel will be like.

Shape and size

127

What are the F1 and F2 for males for i?

270; 2290 Hz

128

What are the F1 and F2 for females for i?

310; 2790 Hz

129

What are the F1 and F2 for males for a?

730; 1090 Hz

130

What are the F1 and F2 for females for a?

850; 1220 Hz

131

What are the F1 and F2 for males for u?

300; 870 Hz

132

What are the F1 and F2 for females for u?

370; 950 Hz

133

The shape of the vocal tract will have implications as to ___

what the sound will be like

134

Computer reads the ___ of each formant, but the formant is the ___.

the center frequency; whole band

135

One of the ways to get formants is to do a ____

formant plot (don't touch inside the graph!!)

136

When you look at pitch change, what graph do you view?

pitch contour (only shows F0)

137

What are the potential cues for vowel identification? 2

1 static properties v. dynamic properties,
2 intrinsic properties v extrinsic properties

138

What are static properties of vowels?

such as steady-state formant frequencies and the fundamental phonetic environment) e.g. speaking rate

139

What are dynamic properties of vowels?

including inherent spectral change and consonantal context effects; relative vowel amplitude

140

What are intrinsic properties of vowels?

(intra-segmental) relational properties, especially
relations among the fundamental and formant frequencies
within vowels

141

What are extrinsic properties of vowels?

(transsegmental) relational properties, such as
relative vowel duration and the relative formant frequencies of a vowel compared to those of other vowels of the same speaker

142

Typically (but not always), tense vowels are ___ than lax vowels in English.

longer

143

What is the one exception to the rule that tense vowels are longer than lax vowels?

ae; relatively long and lax

144

We don't differentiate vowels in English by ___ alone.

duration

145

Is duration phonemic relevant for differentiation of vowels?

no, it relevant but is non-phonemic

146

Generally, low vowels are (more/less) intense than high vowels.

more

147

Vowel intensity is a (primary/secondary) cue in English.

secondary, but it does matter

148

Typically lower vowels have a (lower/higher) F0 than mid and high vowels.

lower

149

___, ___, & ___ are secondary cues in English vowels.

duration, intensity and F0/pitch

150

What are three early intrinsic factor theories?

1 absolute formant frequencies determine the vowel identity
2 ranges of formant frequencies determine vowel identity
3 ratios of formant frequencies determine vowel identity

151

What are three early extrinsic factor theories?

1 vowel identity is determined by "normalizing" by means of point (corner) vowels.
2 listeners "estimate" (infer) the speakers' vocal tracts and use that as normalizing info to perceive vowels.
3 vowel identity is aided by formant transitions of consonants.

152

What are issues with differentiating vowels by formants alone? 3

1 variability involving formant values and even formant ranges exist on man levels (individual, speaking rate)
2 there are some overlapping areas in the formant ranges
3 ignores other factors that can be relevant cues (such as formant transitions, etc.)

153

___ & __ came up with an important chart in 1952 demonstrating that vowels occur in specific places.

Peterson & Barney

154

What are the issues with differentiating vowels by formant ratios? 3

1 variability
2 male, female, and child vocal tracts are not scale models of each other
3 ignores other factors that can be important relevant cues (such as formant transitions, etc.)

155

One theory uses ______ which means that there are intrinsic factors adding an element of speaker normalization.

Formant Radiots w/ F0 (speaker normalization is F0, where SR is "sensory reference")

156

The formant ratios theory involves what 3 formulas?

there are 3 dimensions; x axis= log(SF3/SF2)
y axis= log(SF1/SR) and z axis=log(SF2/SF1)

157

What does SR mean in the formant ratios theory?

"sensory reference" = 168 (GMF0/168)1/3 (grand mean F0)

158

What are problems with Formant Ratios w/ F0?

variable
ignores other factors that can be important relevant cues (such as formant transitions, etc.)

159

What does the graph of Formant Ratios w/ F0 look like?

it is a tetrahedron (Miller's Tetrahedron; vowels vary along the three dimensions)

160

What are extrinsic theories reflect what?

listeners "estimate" (infer) the speakers' vocal tracts and use that as normalizing info to perceive vowels

161

What is an example of extrinsic factors at work in a speech science theory?

Vocal Tract Normalization

162

___ used an experiment where there was a precursor phrase preceding the stimuli; the stimuli were "bit" or "Bet"; in one condition the original samples were played, but in the second condition, F1 was shifted down (to stimulate a lger vocal tract) while the stimuli were unmodified; participiants were asked to ID the stimulus as "bit", "bet" or "bat." They found some effects.

Vocal Tract Normalization

163

Vocal tract normalization is a ____ theory.

extrinsic

164

Formant ratios w/ F0 is a ___theory.

intrinsic

165

Point vowel normalization is a ___ theory.

extrinsic factors

166

In ___, theorist said that listeners "calibrate" the vowel space of the speaker by listening to the "point" (corner) vowels, lke /i a u/; other vowels can be located by referring to point vowels It is a more specific version of vocal tract normalization. Based on the vowel space you form your reference.

point vowel normalization

167

___ is a more general version of point vowel normalization.

vocal tract normalization

168

What are the 4 intrinsic vowel theories?

1 Absolute Formant frequencies determine vowel ID
2 Ranges of Formant frequencies determine vowel ID (Peterson and Barney)
3 Ratios of Formant frequencies determine vowel ID (Peterson and Barney)
4 Formant Ratios adding an element of Speaker Normalization (Miller)

169

______ is when you have a group of speakers who's sample is clear. The better the conditions, the less errors.

Talkers blocked

170

What are the 3 intrinsic vowel theories?

1 Vocal Tract Normalization (Liberman and Gerstman)
2 Point Vowel Normalization (Lagefoged & Broadbendt)
3 Dynamic Specification Model (Strange)

171

Which cues are important in a signal? 4

1 static properties
2 dyanmic properties
3 intrinsic (intrasegmental)
4 extrinsic (transegmental)

172

What is the conclusion of Nearey's study?

"the main conclusion of this work is that alhtough the relative imp of some of these FX is situation dependent, none of these factors can be safely ignored in a full acct of English vowel perception. It seems fruitless for us to concentrate on only one set of FX and assume that the others are lab curiosities."

173

What are diphthongs?

gradual transitions from one vowel like articulation to another; highly variable

174

The harmonic spectrum is also known as ___

the spectrum of the glottal buzz or laryngeal output

175

Resonances of the vocal tract are called ____

formants

176

What do you see in a wide-band spectrogram?

formants

177

What do you see in a narrow-band spectrogram?

harmonics

178

How do F0 changes affect the narrow-band spectogram?

the greater the F0 the larger the distance will be between the harmonics

179

What is GMF0 mean?

grand mean F0