Summary Flashcards

(70 cards)

1
Q

Most fundamental qualities of sound

A

Pitch (wavelength) and loudness (amplitude)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The larynx is formed by 4 cartilages

A
  • Thyroid
  • Cricoid
  • 2 Arytenoids
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Vocal folds and Vocal Tract

A

Vocal folds are two bands of muscle that are located within the larynx (voice box). They vibrate when air is pushed through them, producing sound.

The vocal tract is the area of the body which includes the vocal folds and all of the other structures involved in producing sound, such as the mouth, nose, and throat.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When does the vocal folds shorten and when lengthen

A

Short : thyroid cartilage contracts –> arytenoid slides –> decreasing of the distance vocal processes and thyroid prominence

Length: cricoid cartilage contracts –> thyroid and cricoid rotate –> increase distance vocal processes and thyroid prominence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does contraction of cartilages do?

A

manipulate length of vocal folds, abduction (vocal folds further) and adduction (vocal folds closer)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

3 involved systems in speech production

A
  1. sub glottal system (initiation phase –> breathing)
  2. glottal system (phonation phase –> Bernoulli so contraction cartilages)
  3. supra-glottal system (articulation phase –> oral and pharyngeal cavity)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Characterizing vowel and consonants

A

vowels:
- location (front, central, back) –> front means higher f1
- tongue position (high, mid, low) –> high means lower f2
- mouth position (rounded or unrounded)

consonant:
- place
-manner
- voiced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

speech characteristics

A
  1. Periodicity –> voiced
  2. local maximum –> vowel
  3. silence and pre voicing –> plosive
  4. noise –> fricatives
  5. burst –> plosive
  6. change in amplitude –> change in sound
  7. change is sound structure –> change mouth position
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Coarticulation

A

the process of blending one sound into another in order to achieve a desired pronunciation
- anticipatory (u influences word onset in stew)
- carryover (u influences consonant in use)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

prosodic features
- properties of larger units of speech and reflects elements of language not encoded by grammar or choice of vocabulary
- To convey meaning and emotion

A
  • intonation (use of pitch to convey meaning in speech)
  • stress (emphasis placed on certain syllables of a word or phrase)
  • Tone (the emotion or attitude in speech)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Two parts of the Fourier spectrum

A
  • Amplitude spectrum
  • Phase spectrum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Fourier transform

A

The Fourier transform is a mathematical technique used to transform a signal from its time domain into its frequency domain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain briefly how the functionality of the cochlea is similar to Fourier Analysis

A

The functionality of the cochlea is similar to Fourier analysis in that it breaks down sound waves into their frequency components. This is done by converting the sound wave into an electrical signal, which is then analyzed by the cochlea. The cochlea then separates the signal into different frequency bands, allowing the auditory system to interpret the sound.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Path of sound

A

Ear canal –> eardrum –> ossicles –> cochlea

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Three small bones (ossicles) in middle ear and function

A

Malleus, incus and stapes
to transmit tiny sound vibrations to the cochlea

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Function and parts of inner ear

A
  • Cochlea
  • Basilair membrane
  • oval window and round window are openings

responsible for converting sounds waves into electrical signals that can be interpreted by the brain

The cochlea also helps to filter out background noise and adjust the volume of incoming sounds. (Bandpass-filter)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Outer ear, parts and function

A
  • auricle (outside)
  • ear canal (connects to middle ear)

funneling the acoustic wave into ear canal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

middle ear, parts and function

A

transfers vibrations of air particles into vibrations of mechanical structures

  • Eardrum
  • ossicles (malleus incus stapes)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does the acoustic reflex?

A

spans the space between stapes and wall of middle ear, if this contracts it reduces the motion of the stapes

  • protects ear from loud noise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Otitis media with effusion

A

Infections where ear cavity fills up with fluid and no longer perform an impedance bridge between air-filled ear canal and fluid filled cochlea.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Mel scale frequency

A

a logarithmic frequency scale used to measure the perceived pitch of a sound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Basic idea of Fourier transform

A

any signal can be approximated by sum of cosines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

VoCoder

A
  • Encoder coding the speech
  • Decoder re-synthesizing speech

technique for coding speech for more efficiently for long distance phone calls

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A3 Scrambling

A

to encode longer distance radio-telephone calls
- frequency bands were rearranged and inverted
- intercepted and decoded by Germans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
SIGSALY (Project X or Green Hornet)
based on Vocoder - needed for encryption (white noise stored on 2 vinyl phonographic records) - special turntables to synchronize time
26
Concatenation
process of splicing together pieces together of pre-recorded speech
27
Signal processing modification
process of changing a pre-recorded signal to produce a desired sound
28
Advantages and disadvantages of concatenation
A : ability to produce natural-sounding speech : flexibility in creating new words : speed of production D : lack of control over the sound of the speech : its susceptibility to error : inability to produce continuous speech
29
Advantages and disadvantages of signal processing modification
A : producing greater degreee of control over the sound of input D : more computationally intensive : more difficult to create new words or phrases with its technique
30
Challenges for speech perception
1. Lack of invariance problem - phonetic environment - differing speech conditions (tempo) - speaker variation (dialects) 2. perceptual constancy and normalization - ability recognize and interpret speech sounds regardless the context - map signals to independent category 3. speech segmentation problem - difficult to identify and segment individual speech sounds
31
First generation speech synthesis
generated by explicit model - articulatory synthesis --> using physiological models that stimulate movement vocal tract and articulators. - source-filter models --> two components combined, a source (vocal folds) with a filter (vocal tract) - formant synthesizers --> digital synthesizers that use combination of source-filter and pre-recorded vocal sample to generate realistic sounding speech
32
Cochlear implants (application of the SIGSALY)
--> neuroprosthetic device that bypasses the normal acoustic hearing process by electric stimulation of auditory nerve
33
Generations of speech synthesis
first --> source waveform is generated by explicit model second --> source waveform is generated by data third --> source waveform is learned from the data
34
second generation speech synthesis
tradeoff between processing speed and memory - model based - sample based
35
third generation of speech synthesis
input is Mel frequency cepstral coefficients - divide signal in frames of 20-40 ms - mel filter bank (determine filter bank energies) - log transform - compute discrete cosine transform (DCT)
36
Unit selection
- Generating speech using data base of pre-recorded speech samples and selecting most appropriate units of speech form the data base ++ more natural speech -- less generalizable and more recordings needed
36
Unit selection
- Generating speech using data base of pre-recorded speech samples and selecting most appropriate units of speech form the data base ++ more natural speech -- less generalizable and more recordings needed
37
diphones
the sound between two adjacent phones, combined to form words
38
advan and disadvantages for third generation speech synthesis
A : automatically train so avoid hand written rules : high quality synthesis and compact D : speech has to be generated by parametric model, final quality is dependent on parameter-to speech technique used
39
applications of text to speech
1. people with visual impairments to listen to text 2. listening to text during driving 3. travel information in public transport
40
components of a text to speech synthesizer
- text analysis * identify tokens * tokenizing (split in smaller chunks) * normalization (determine spoken variant of each token) - linguistic analysis * phonemes * prosodic information (intonation, duration, stress, rhythm) - waveform generation (1,2,3)
41
Corpus
a collection of texts with some unifying characteristics
42
regular expression
sequence of characters that define a search pattern in strings of text such as words, phrases and numbers
43
Major uses of corpora?
- applicative (develop nlp tools) - analytical (empirical basis on the distribution of constructions and language phenomena)
44
how to do regular expression
- normalizing text (standard form) - tokenization (splice words) - lemmatization (find similar roots) - stemming (make simpler to roots) - sentence segmentation (breaking a sentence) - compare words and strings
44
dimensions of variation
- multiple languages (code switching) - genre (source of the text) - demographic characteristics writer - language changes over time
45
datasheet properties
motivation situation language variety collection process annotation process distribution
46
normalization process
1. tokenizing - token learner - token segmenter 2. normalizing word formats - case folding (lower case) - lemmatization - morphological parsing - stemming 3. segmenting sentences
47
Homophones and homographs
phones --> same sound, different spelling graphs --> same spelling, different sound
48
Semantic relations
synonymy, antonymy, hypernymy/hyponymy, meronymy/holonymy, co-hyponyms
49
synonymy
house - villa same sense, different word
50
antonymy
good - bad tegenstelling
51
hypernymy/ hyponymy
"dog" is a hyponym of the word "animal" because animal is less specific
52
meronymy / holonymy
fingers is meronym of hand because it is a part of the hand hand is the homonymy of fingers because it is the whole
53
meronymy / holonymy
fingers is meronym of hand because it is a part of the hand hand is the homonymy of fingers because it is the whole
54
co-hyponyms
cat and dog are co-hyponyms because both a type of word animal
55
associated words
cup and coffee because belong to same semantic field
56
Connotation / evaluation
positive (happy) negative (sad) connotation pos (great). neg (terrible) evaluation
57
important dimensions of affective meaning
1 valence (neg of pos ) 2 arousal (excited or not) 3 dominance (control or not)
58
sentiment
positive or negative evaluation language
59
two most common used models in vector semantics
tf-idf and word2vec
60
tf-idf
measure the importance of a term in a document relative to other documents in a corpus
61
word2vec
methods used to represent words in a vector space in order to capture semantic and syntactic relationships between words
62
cosine similarity
measure of similarity between two vectors, which is calculated by taking the cosine of the angle between the vectors
63
PPM (point wise mutual information)
see if a word appears more often with a word than expected
64
Skipgram vs Cbow
two methods used to represent words in a vector space - CBOW is method used to predict a set of context words given a target word - Skipgram is a method used to predict a target word given a set of context words
65
two kind of similarities
first-order co-occurrence (wrote and book) if they are nearby second-order co-occurrence (wrote and said) if they have similar neighbors
66
aims to identify opinions 1
1. SO polarity 2. PN polarity 3. strength of PN polarity 4. extracting opinions
67
Balanced corpus
big in size mixed language full texts different domains and genres range of text categories well documented
68
classifying corpora
1 mode (written, spoken, mixed...) 2 representativeness (balanced, specialized) 3 time (diachronic, synchronic) 4 language (mono, multi, parallel, comparable) 5 sampling (full documents, sample) 6 mark up (raw annotated)