spectrogram or sonogram
Frequency and time information about sound can be graphed together
x-axis = time; y-axis = frequency
Amplitude is indicated by the darkness of the display
Taxonomy of Sound
Taxonomy is a scientific classification system.
It divides a large number of items into different categories in order to organize the items into smaller groups.
Sound can be classified in different ways, (e.g. frequency audibility versus signal complexity)
Noise
(aperiodic sound) contains a very large number of frequency components
More specifically, noise contains all possible frequencies within a bandwidth
bandwith
a range of frequencies listed from low to high.
the bandwidth of the telephone is small and that is why the quality of the voice is so poor (simplifying the complicated noise in out sound)
300 Hz-3,000Hz is the band of the telephone
Band-pass filters are specified by the frequency width of their pass bands, also known as the bandwidth of the filter. The bandwidth is measured between the cut-off frequencies, as defined by the -3-dB attenuation frequency.
spectral splatter
When ever a sound turns on and off very abruptly, the sound will have multiple frequencies in it. Even if we took a pure tone and turned it on and off very abruptly, then additional frequencies would result. This phenomenon is called “spectral splatter”
An octave bandwidth could be
1000 Hz – 2000 Hz or
6000 Hz – 12,000 Hz
an octave is a doubling of frequency
A 1-Hz bandwidth could be
1000 Hz – 1001 Hz or
6000 Hz – 6,001 Hz
white noise
equal energies per frequencies
pink noise
equal energy per octave
sounds lower like it has more base
the spectrograph looks like a negative sloping line
Impulse noise
an abrupt sound produced by sudden force with rapid onset and offset short duration – less than 1 second example - gun shot looks like white noise on a spectrogram
DB SPL
DB sound Pressure Level
90 DB is the limit for noise level that can hurt your hearing (can do 90 DB 8 hours a day without arm)
Impulse noise has what kind of bandwidth
a wide bandwidth – contains many frequencies
measuring impulse noise
you would measure the peak noise levels in SPL
we can use an app for learning purposes- but if doing ream measurement we’d have to use a calibrated SPL meter
Decibel level meter
current, peak and average 1 min SPL
Noise - Spectral Density
Noise is often characterized by its power spectrum density
power spectrum density
Power spectrum density is the average amount of sound power across all of the frequencies of the sound
guitar PSD would be less than the white noise PSD
Continuous Non-Stationary Noise
continuous in time
random sound changing randomly over time
power spectrum density is not constant
Example: speech babble noise
broadband vs narrowband use
When testing pure tone hearing, use a narrow band to mask one ear, (because it is one frequency)
when testing word hearing, use broadband (because speech includes more frequencies)
Speech noise is based on the
LTASS (long Term Average Spectrum of Speech)
LTASS
Estimate of the Average intensity of speech as a function of frequency
the Ling 6 sounds
/a/, /u/, /i/, /s/, /ʃ/, /m/
Noise
anything that affects our ability to hear the sound of interest (masking noise)
Anthropogenic
changes in nature made by humans; pollutants originating from human activity
Anthropogenic Has impacts on human health
Hearing Communication Endocrine Function Blood Pressure Attention, Memory, Problem-solving Sleep quality
noise pollution
Anthropogenic
Anthropogenic Has impacts on the environment and other species
Hearing
Communication
Physiologic Function
Reproduction
more on the vocal folds as “strings”
The sound produced by the vocal folds is similar to a sawtooth wave, having both odd and even harmonics that roll-off in their amplitudes with increasing frequency
It would sound like a buzz
the wave made from vocal fold vibrations
Note that it resembles the sawtooth wave and will have the harmonics that we have discussed for sawtooth waves.
why do we study tubes?
because the vocal tract is a tube
Transverse waves on strings
waves on strings
node
closed end of tube and fixed end of string
antinode
open end of tube loose end of string
Longitudinal
waves in the air in tubes
string fixed at one end equations or tube model
same as before, but v= speed of sound and wavelength= 4L (so denominator of equation is 4L)
quarter wave resonators
Tube closed at both ends
odd and even harmonics!
half wave resonators
same equations as fixed string
Tube open at both ends
same as when closed at both ends
Tube resonators
The frequencies produced by standing waves in a tube will “resonate”, meaning they will be amplified
These frequencies are referred to as resonance frequencies
The tube is a called a tube resonator
Transfer Functions
Transfer functions describe what happens to frequencies as they are pass through a system
Consists of 2 parts:
The amplitude response
The phase response
transfer functions on graph
A transfer function is a graph that shows how a system will filter each frequency that might pass through it. Transfer functions are usually obtained by putting different frequencies of sound through the system to see how the system treats them. Keep in mind that a system could be anything. Your phone, an MP3 player, a stereo, the outer, middle or inner ear, the vocal tract etc.
Output-Input ratios on a Transfer function
Y-axes on transfer functions can vary. Most of the time, the y-axis will be given in decibel. Recall that to get decibel, you take the log of a ratio. The ratio in this case would be the output/input ratio. In this figure, they just plot the ratio.
Filters
systems that let some frequencies pass through amplified or unchanged, while other frequencies are attenuated
other types of systems
Other types of systems would be amplifiers and resonators. Both amplifiers and resonators also have their own transfer functions. In the case of amplifiers and resonators, some frequencies get amplified.
Low-pass filters
attenuate high-frequencies allowing low-frequencies to pass through amplified or unchanged
High-pass filters
attenuate low frequencies allowing high frequencies to pass through amplified or unchanged
Band-pass filters
a band of frequencies passes through amplified or unchanged, while frequencies higher and lower are attenuated.
Frequencies above and below the two cut-off frequencies will be attenuated
Frequencies between the two cut off frequencies will pass unaltered in this example
Opposite of a band-reject filter
Band-reject filters
a band of frequencies is attenuated while frequencies higher and lower pass through amplified or unchanged.
Frequencies above and below the two cut off frequencies will pass through
frequencies between the two cut frequencies will be attenuated.
Opposite of band-pass filter
attenuate
taking down the volume of something
Transfer function for an Ideal Filter
cuts straight off at the last allowed frequency
doesn’t happen in the real world
Transfer Function for a Real-Life Filter
Real-life filters are never ideal, meaning that the pass-band and the stop-band are not as sharp as in the ideal filter
There is no single frequency giving a visual boundary between the pass-band and the stop-band
The cut-off frequency is defined as the frequency at which the sound is attenuated by
-3 dB
Cut-off frequency
the frequency at which a filter begins attenuating frequencies outside the pass band – defined as the frequency that is -3 dB down from the amplitude of the pass band
Pass band
the frequency region over which frequencies pass through the filter unchanged
Stop band
the frequency region over which frequencies are attenuated
Roll-off
the slope of the filter in dB/octave
expressing and calculating roll offs
Roll-offs are expressed as dB per octave
Calculated by subtracting the attenuation in dB at two frequencies, one octave apart, in the region where the slope is steepest
Example: At 2k Hz, System X attenuates the signal by 24 dB. At one octave above this (4k Hz) the attenuation is 48 dB. Therefore, the filter has a roll-off of 48-24 = 24 dB/octave.
Tube Resonators are filters
Tube resonators are bandpass filters
Tube resonators give high amplification (gain) at resonance frequencies and low amplification (gain) at other frequencies
real ear microphone measures
used to test to see what the ear canal resonance is
transfer function
Tube resonances can be plotted on a graph called a transfer function
The transfer function shows how sound will be filtered as it travels through the tube
A transfer function is continuous
Vocal tract transfer functions
Shows how sound will be affected as it passes thru the vocal tract resonator/ Filter
output spectrum
An output spectrum has lines like a line spectrum with a continuous line over the top
spectrum of sound coming out of a system
Input spectrum
spectrum of sound going into a system
Vocal fold vibration
input
Vocal tract
the system
Speech
output
Vocal tract as a series of tubes
The vocal tract can be thought of as a series of tubes of varying length and cross sectional area. Changes in vocal tract shape cause changes in the resonant frequencies of the tubes.
Specific resonant frequencies are associated with specific vowels.
formants
The resonance frequencies are called “formants” and are labeled with capital F and a number
The brain interprets vowel sounds based on the pattern of resonance (formant) frequencies in the output spectrum
The lower the fundamental frequency, the more dense the harmonics on the glottal spectrum
Voices with lower fundamental, typically yield more formants in the output spectrum
Input Transfer Output
The more dense the harmonics are, the more likely it is that there will be a match to the transfer function and the more likely that all formant peaks will show up in the output spectrum
Combining the time and frequency domains
Frequency and time information about sound can be graphed together
This type of display is a called a spectrogram or sonogram
x-axis = time; y-axis = frequency
Amplitude is indicated by the darkness of the display
glottal pulses
Voicing is indicated by vertical lines
Aperiodic noise does not have a strong vertical line pattern
Wideband vs Narrowband
When making a spectrogram, a band-pass filter is used in the analysis of the sound. Use of wide or narrow filter bandwidth gives different looking results
A wide bandwidth (300 to 500 Hz bandwidth) filter gives better resolution in time A narrow bandwidth (30 to 50 Hz bandwidth) filter gives better resolution in frequency
Wideband spectrograms often resolve glottal pulses from voicing
Narrowband spectrograms often resolve harmonics
Formants can be seen on either type of spectrogram, but may appear more prominent on a wideband spectrogram
Changes with increasing f0
As the fundamental increases glottal pulses are more closely spaced
Harmonics are more widely spaced and show an obvious change of frequency over time
Tongue Height
High vowels typically have a low F1
Tongue height inversely related to F1 frequency
Example: /i/
Tongue Advancement (forward-back)
Front vowels have a high F2
F2 frequency is directly related to forwardness of the tongue
Example: /u/
vowels at F3
Note that F3 remains relatively constant (flat line) across all vowels
Diphthongs
Note that the formants
“move” over time in a diphthong
Nasals
Nasals are voiced so glottal pulses may be evident
Nasals have formants that are lighter than those seen in vowels
Sometimes nasals will have antiformants
Stops (Plosives)
Stops are marked by closure of articulators creating a silent interval
Release of closure creating a burst
Aspiration
Aspiration
aperiodic noise created by turbulence of air passing between the articulators as they are released
Voice Onset Time
A feature of stop consonants The length of time between the release of the stop and the onset of voicing Positive VOT Negative VOT Fully voiced stop Partially voiced stop (pre-voicing)
Positive VOT
– voicing starts after the release of the stop
– length of aspiration affects VOT
Negative VOT
– voicing starts before the release of the stop
Fully voiced stop
– VOT starts with the onset of the stop
Partially voiced stop (pre-voicing)
– voicing begins during closure (English: /b/, /d/, /g/ in initial position)
Categorical Perception
Sensory phenomena are perceived as fitting into a certain category
The category can dramatically change when some aspect of the sensory phenomenon is systematically varied
Example: The amount of Voice Onset Time determines the /da/ versus /ta/ perception
Fricatives
Turbulence of air passing over approximated articulators creates aperiodic sound - noise
Unvoiced fricatives
Voiced fricatives
Weak fricatives may show very little evidence of noise
Unvoiced fricatives
turbulence is only noise source
Voiced fricatives
have 2 sound sources – voicing and turbulence
Affricate
Features of both stops and fricatives
approximates
Similarities to vowels (especially diphthongs)
Dynamic (“moving”) formant structure
But shorter than them in duration
The changing frequencies of the formants over time will be important in identifying approximants as a class of speech sounds.
The position of the approximant and the phonemes that surround it will affect how the formants change.
the approximate will seem to transform in to the vowel that follows it
Position and Transitions
Position affects acoustic characteristics
/d/ in initial position is voiced
/d/ in final position is unvoiced
Co-articulation of phonemes shows up acoustically
Formants are more dynamic during running speech due to effects of co-articulation and pitch changes
decibel
The most frequently used unit of measurement in hearing science is the decibel.
The decibel is a logarithmic unit that indicates the relative difference between two measurements.
Absolute difference
The absolute difference tells us how much greater one measurement is than another.
This involves subtraction.
For example: My child has grown 4” in the past month.
Relative difference
The relative difference tells us how many times greater one measurement is than another.
This involves division.
For example: Mississippi is twice as large as West Virginia
Relative measures
Humans do not respond to absolute changes in physical properties.
A person can easily detect a change in the mass of an object from 1 kg to 2 kg (100% change) but may not be able to detect a change from 101 kg to 102 kg (1% change).
Logarithmic units commonly seen in hearing science
The octave
The decibel
Why base 2 for the octave scale
- Human perception was responsible for the octave scale.
- Once frequency was measured, it was found that the octave was equal to a doubling of frequency.
- So the logarithm base 2 is a reflection of the relationship between the physical characteristic and our perception of it.
The bel
The bel (B) was developed to describe power loss along cables that transmitted speech signals. The bel is calculated by determining the ratio of two powers and finding the log10 of this ratio. Base 10 is convenient for doing logarithmic calculations. It also compresses a large range of power magnitudes into a much smaller range.
Decibel vs Bel
Instead of the bel, we use the decibel (dB), which is one tenth of a bel (i.e., 10 dB = 1 B).
Why use the decibel?
Answer: the decibel notation in sound pressure and intensity is much closer related to how we, as humans, perceive those changes
For example: the JND or Just Noticeable Difference in pressure or intensity change is about 3 dB
If we expressed the difference (subtraction) between the pressure reading 1.415 Pa – 1 Pa this would equal .415 Pa difference. This difference, expressed in decibel (change) would equal about 3 dB
dB spl
dB SPL (dB Sound Pressure Level) is a common unit we use to refer to how loud a certain sound is dB SPL is the amount of pressure above or below a reference pressure of 20 micro Pascals (Pa)
Reference point
Why 20 micro Pa?
That is the threshold of human hearing at 1000 Hz
Decibel calculation
So if we want to know the dB SPL of a sound, we would measure the RMS of a pressure wave (condensed and rarefied air) over a small sample of time
And if we find that pressure to be 1 Pa, we would compare that to our reference pressure which is 20 micro Pa or .00002 Pa on a logarithmic scale and multiply by 20
20 * log(1/.00002) =
Where 1 is the measured RMS pressure in Pa
Where .00002 Pa is the reference pressure for dB SPL
“un-making” a decibel
Now, if we are given the dB SPL of a sound, and we would like to know what the RMS pressure is of that sound, we can find that out too.
Suppose our SPL meter gives us a reading of 94 dB SPL. This is how you find the RMS pressure of that sound
.00002 * 10^(94/20) =
Where .00002 Pa is the reference pressure of dB SPL
Where 94 is the dB SPL reading given by the meter
Pressure and Intensity are related
Also, if you’d like to derive an intensity (I) value from a pressure value, you can use this equation
I = P^2
Where P = pressure (in pascals)
Where I = intensity (Watts/meter^2)
The inverse square law
every time that you double distance, you will go down about 6 DB SPL
a doubling of distance in an open area and a spherical radiating sound wave source= about a 6 DB level
a doubling of distance in a less open area (directional sound source like talking) is 3 db
defuse field
can project the same amount of DB across the whole room
classroom noise
This adds to the overall noise level in a classroom.
Let’s say we have a Sound Pressure Level meter and we measure the overall dB SPL level in the room to be 50 dB SPL.
Now let’s say we can turn off all of those background noises and only measure the teacher’s voice at that same location and it is measured as 60 dB SPL.
Therefore, our Signal to Noise Ratio (SNR) would be 10 dB because:
SNR = Signal (level in dB) – noise (level in dB)
This means that the signal (the teacher’s voice) is 10 dB greater, or louder, than the noise sources (HVAC, outside room noises, etc)
A 15 dB SNR is preferable for teachers speaking in a classroom
SNR
Signal to Noise Ratio
how to achieve it
Increase the signal (teacher’s voice)
Decrease the noise level in the room (noise)
increase the signal
The teacher can speak louder (not ideal)
The teacher can use an amplified speaker system
The teacher could decrease the distance between her and the intended listeners while speaking
decrease the noise level
Turn off noise sources during lecture such as printers, loud fans, HVAC (if possible)
Sound proof doors that lead to the hallway.
Close windows that are near the playground when recess is in session
Affix tennis balls to the bottom of classroom chairs on hard surfaces
Another factor on student’s ability to understand speech in a classroom
is how reverberant the space is.
reverberation
That is known by calculating a classroom’s RT or reverberation time
the ideal classroom has an RT of about
.4 to .6 seconds
How can you increase or decrease RT in a classroom
By adding more absorptive materials to the surfaces in a classroom.
The lower the absorption coefficient, the less sound absorption and more RT. The higher the absorption coefficient, the more sound absorption and less RT
if Px/Pr= 1
dB is 0
but there is always sound
it is just equal to the reference
intensity reference (dB IL)
1X 10^-12 watts/m^2
we can get negative answers
when the number we have is less than the reference.
these are very very soft sounds
log.5
-.3
log2
.3
any time the pressure doubles
the dB SPL gains 6
any time the pressure increases by a multiple of ten
the dB SPL gains 20
any time that the intensity doubles
the dB IL gains 3
any time that the intensity increases by a multiple of ten
the dB IL gains 10
db IL is equal to
dB SPL
we aways add in intensity
unless the problem says pressure doubled