Week 5 Flashcards

1
Q

What’s the time complexity of the algorithm used for local decoding?

A

Polynomial time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the three steps in the local decoding algorithm?

A

<ul>
<li>Calculate the forward probability</li>
<li>Calculate the backward probability</li>
<li>Combine both to find retrospective distribution</li>
</ul>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do we intuitively decode t?

A

Fuse information from the past observations and the future observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the forward probability calculate?

A

The sum of probabilities of all paths from given sequence of observable symbols

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between local decoding forward and viterbi algorithm?

A

Viterbi takes the max where forward algorithm sums

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is unsupervised learning?

A

Estimating the HMMs parameters without annotated tags

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is replaced in the Baum Welch algorithm?

A

The counts are replaced with their estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the three stages in the Baum Welch algorithm?

A
<ul>
<li>Initialisation</li>
<li>E step</li>
<li>M step</li>
</ul>
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the initialisation step in the Baum Welch algorithm

A

Randomly guess some starting value for A0 and B0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe the E step in the Baum Welch algorithm

A

<ul>
<li>Apply forward and backward probabilities to calculate</li>
<li>Calculate retrospective distribution</li>
<li>Calculate pseudo transition estimate for all i, q, q'</li>
</ul>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe the M step in the Baum Welch algorithm

A
<ul>
<li>Re-estimate A for all q, q'</li>
<li>Re-estimate b for all q,w</li>
<li>Update iteration counter</li>
</ul>
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does the likelihood change with each re-estimation step in the Baum Welch algorithm?

A

It increases, till the likelihood converges for the E and the M step

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the four probabilities for each pair of characters in P(O|C)?

A
<ul>
<li>rev(i,e)</li>
<li>insp(p,h)</li>
<li>del(k,n)</li>
<li>sub(f,v)</li>
</ul>
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does rev(i,e) indicate in misspelling probabilities?

A

Order has been reversed : ie -> ei

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does ins(p,h) indicate in misspelling probabilities?

A

Second h is inserted after the first letter :

p -> ph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does del(k,n) indicate in misspelling probabilities?

A

The first letter, preceeding the second is deleted : kn -> n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does sub(f,v) indicate in misspelling probabilities?

A

The first letter is substituted for the seconds : f -> v

18
Q

What is the spelling correction equation?

A

argmax P(O|C)P(C)

19
Q

How can we get observable symbols from acoustic waveform?

A

<ul>
<li>Slice up signal (make it discritised)</li>
<li>Represent each slice as a feature vector</li>
</ul>

20
Q

What does a feature vector for a slice of acoustic noise consist of?

A

Floating point values representing energy (volume) or frequencies within that slice

21
Q

What is our goal for speech recognition?

A

Compute the most probable sentence W for the given acoustic observation

22
Q

What are the two main components for automatic speech recognition (ASR)?

A

<ul>
<li>Language model P(W)</li>
<li>Acoustic model P(O|W)</li>
</ul>

23
Q

What levels may a HMM have when involved with speech instead of written language?

A

<ul>
<li>Bigram model of words in a sentence</li>
<li>Bigram model of phones within a word</li>
<li>Bigram model of suphones within a phone</li>
</ul>

24
Q

What is a phone?

A

LANGUAGE INDEPENDENT Individual speech unit

25
Q

What are phones represented by?

A

Symbols from the phonetic alphabet

26
Q

What does IPA try to do?

A

Offer sound symbols to transcribe any spoken language with transcription principles

27
Q

What is a phoneme?

A

Abstract class of sounds that are perceived as one distinctive sound in a given language

28
Q

What is an allophone?

A

Different pronounciations of the same phoneme

29
Q

What are pronounciation dictionaries?

A

Tools used for speech recognition and speech synthesis that contain phonemes and their pronounciations

30
Q

How many subphones does a phone model normally distinguish between in ASR?

A

Three

31
Q

What are the three subphones a phone model distinguishes between in ASR?

A

Beginning, middle, end

32
Q

How is the variable duration of a subphone represented in a HMM?

A

Single loop that transitions from the state representing a subphone back to itself

33
Q

What is a Bakis network?

A

A structure where transitions can only go forwards or loop on a single state, but never backwards

34
Q

How are unresolved ambiguities in speech recognition returned?

A

Word lattice

35
Q

What is a word lattice

A

Labelled, directed acyclic graph, with states roughly corresponding to points in time

36
Q

What is the goal of automatic speech recognition?

A

Computationally build systems that map from an acoustic signal to a string of words

37
Q

What is the intuition of a noisy channel model?

A

Treat the acuostic waveform as a noisy version of the string of words

38
Q

Describe how the noisy channel model works

A

If we know how the channel distorts the sound, we could find the correct source sentence for a waveform by taking every possible sentence in a language, running it through channel and seeing if it matches the output

39
Q

What is a HMM characterised by?

A

<ul>
<li>Set of states</li>
<li>Transition probability matrix, probability of moving from state i to state j</li>
<li>Set of observations</li>
<li>Emission probability matrix, probability of observation being generated from given state</li>
<li>Start and end state</li>
</ul>

40
Q

What are HMM characterized by in speech recognition?

A

<ul>
<li>Set of states corresponding to subphones</li>
<li>Transition probabilitiy matrix with representations for self loop and going to next subphone</li>
<li>Emission probabilities, probability of a feature vector generated for each subphone state</li>
</ul>

41
Q

What are lexicons?

A

Lists of words, with a pronounciation for each word expressed as a phone sequence

42
Q

What is the decoding question for speech recognition?

A

Given a string of acoustic observations, how should we choose the string of words that has the highest posterior probability?