Neural networks Flashcards

(401 cards)

1
Q

Here are some flashcards with questions strictly from the sources:

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Flashcard 1

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Front: What topics were covered in Week 1 to Week 11?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Back: Week 1 covered “Neural Language Modelling”. Week 2 covered “Neural machine Translation & Transformers”. Week 3 covered “Multi-lingual Machine Translation”. Week 4 covered “Low resource & Multi-modal machine translation”. Week 5 covered “Overhype versus reality: When to use machine translation…and when not to”. Week 6 is “Overview”. Weeks 7

A

8

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Flashcard 2

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Front: On what level did the translation model have to be defined according to the text?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Back: The model has to be defined on the word level instead of the sentence level.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Flashcard 3

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Front: What is introduced into the translation model as a “hidden variable”?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Back: Underlying connection between source and target words is introduced into the translation model as a so-called “hidden variable”.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Flashcard 4

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Front: What is a hidden variable?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Back: A hidden variable is a variable which has an influence on the model but is not actually seen.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Flashcard 5

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
**Front:** What does 'a' represent in the context of translation probability?
26
**Back:** 'a' represents alignment “sentence” A sequence of alignments (positions) for each source word.
27
28
**Flashcard 6**
29
30
**Front:** Provide an example of translation probability with word alignment given in the source.
31
**Back:** "= I like red bicycles = me gustan bicicletas rojas = 1 2 4 3".
32
33
**Flashcard 7**
34
35
**Front:** How many target words can source words be connected to in one type of alignment?
36
**Back:** Source words can be connected to exactly one target word.
37
38
**Flashcard 8**
39
40
**Front:** What is the term for a source word without connections in alignment?
41
**Back:** A source word without connections is called a spurious word.
42
43
**Flashcard 9**
44
45
**Front:** What is "zero fertility" in word alignment referring to?
46
**Back:** "Zero fertility" refers to a word not translated.
47
48
**Flashcard 10**
49
50
**Front:** What do phrases in the context of SMT allow?
51
**Back:** Phrases allow translation from a word group to a word group.
52
53
**Flashcard 11**
54
55
**Front:** What is a limitation of word-based translation models compared to phrase-based models?
56
**Back:** Word-based translation models only allow translation from a single word into a word group.
57
58
**Flashcard 12**
59
60
**Front:** What are some advantages of using phrases over words in translation?
61
**Back:** Longer context can generally be captured
and there is better handling of idioms and other multi-word expressions.
62
63
**Flashcard 13**
64
65
**Front:** What constitutes an inconsistent phrase pair according to the example?
66
**Back:** Middle: Inconsistent phrase pair.
67
68
**Flashcard 14**
69
70
**Front:** What is the goal of decoding in the context discussed?
71
**Back:** Decoding aims to find the best hypothesis.
72
73
**Flashcard 15**
74
75
**Front:** What type of data was mentioned in relation to the neural network forward pass in the lecture overview?
76
**Back:** Complex “unstructured” data.
77
78
**Flashcard 16**
79
80
**Front:** What will be covered in a later part of the lecture regarding neural networks?
81
**Back:** Neural network forward pass with images and Backpropagation.
82
83
**Flashcard 17**
84
85
**Front:** What is naive text input for a neural network?
86
**Back:** Naive text input for neural network.
87
88
**Flashcard 18**
89
90
**Front:** What is a common non-linear function used in neural networks after getting a value for each node?
91
**Back:** One of the most common is called a Rectifier Linear Unit (ReLU).
92
93
**Flashcard 19**
94
95
**Front:** What is the mathematical definition of the ReLU function?
96
**Back:** f(x) = max(0
x).
97
98
**Flashcard 20**
99
100
**Front:** What happens to the value at a node if it is less than zero when using the ReLU function?
101
**Back:** if the value at a node is less than zero
we f(x) = max(0
102
103
**Flashcard 21**
104
105
**Front:** What is the significance of the values 0.875
0.004
106
**Back:** These values sum to 1.
107
108
**Flashcard 22**
109
110
**Front:** What is involved in the backward pass of a neural network?
111
**Back:** Inputs
outputs (o)
112
113
**Flashcard 23**
114
115
**Front:** What is a common loss function mentioned in the context of neural networks?
116
**Back:** Cross-entropy is a common loss function.
117
118
**Flashcard 24**
119
120
**Front:** Is cross-entropy the only loss function?
121
**Back:** No
*This is not the only loss function
122
123
**Flashcard 25**
124
125
**Front:** What is the gradient calculated with respect to in the context of neural networks?
126
**Back:** Gradient with respect to the weights of the network.
127
128
**Flashcard 26**
129
130
**Front:** What are some terms associated with calculating the gradient?
131
**Back:** partial derivative and learning rate.
132
133
**Flashcard 27**
134
135
**Front:** What are the sets used in training and evaluating a model mentioned in the lecture?
136
**Back:** Dataset Train model and Dataset Test set.
137
138
**Flashcard 28**
139
140
**Front:** What type of error surfaces do neural networks have?
141
**Back:** Neural networks have non-convex error surfaces.
142
143
**Flashcard 29**
144
145
**Front:** What is a consequence of neural networks having non-convex error surfaces in terms of finding minima?
146
**Back:** Neural networks have non-convex error surfaces (no global minima). We want to get a good local minimum.
147
148
**Flashcard 30**
149
150
**Front:** What are some methods of gradient descent mentioned in the lecture?
151
**Back:** Stochastic gradient descent
batch gradient descent
152
153
**Flashcard 31**
154
155
**Front:** What is "one-hot encoding" used for in the context of neural networks?
156
**Back:** Word representation (naïve).
157
158
**Flashcard 32**
159
160
**Front:** What is a problem with using "one-hot encoding" for word representation?
161
**Back:** This is sparse! (lots of zeros) Will get even more sparse as the vocabulary grows!.
162
163
**Flashcard 33**
164
165
**Front:** What are artificial neural networks inspired by?
166
**Back:** Artificial neural networks (or simply neural networks)
although inspired by the neurons in the human brain.
167
168
**Flashcard 34**
169
170
**Front:** What are neural networks essentially from a mathematical perspective?
171
**Back:** ...nothing more than a bunch of mathematical functions involving a large number of matrix multiplications.
172
173
**Flashcard 35**
174
175
**Front:** What is a key capability of neural networks?
176
**Back:** The power of neural networks (NNs) lies in their ability to create complex mappings (functions) between their inputs and outputs.
177
178
**Flashcard 36**
179
180
**Front:** Why are derivatives of functions important for neural networks?
181
**Back:** ...as they measure the sensitivity to change of the function output value with respect to a change of its input value. This is very important for training neural net-works.
182
183
**Flashcard 37**
184
185
**Front:** What does an artificial neuron do with its inputs?
186
**Back:** An artificial neuron takes several inputs
for example three
187
188
**Flashcard 38**
189
190
**Front:** What are weights in an artificial neuron?
191
**Back:** w1
w2 and w3 are weights
192
193
**Flashcard 39**
194
195
**Front:** What is the role of the function z(x) in an artificial neuron?
196
**Back:** First
a function z(x) takes all the inputs and converts them into a weighted sum: z(x) = w1x1 + w2x2 + w3x3 + b.
197
198
**Flashcard 40**
199
200
**Front:** What is 'b' in the weighted sum of an artificial neuron?
201
**Back:** b represents the inclusion of “bias” in each neuron in order to avoid that the weighted sum of the inputs becomes equal to 0. Bias gives the network something to work with in case that all input values are 0.
202
203
**Flashcard 41**
204
205
**Front:** What is the general formula for the weighted sum in a neuron?
206
**Back:** z(x) = ∑ i xiwi + b.
207
208
**Flashcard 42**
209
210
**Front:** What happens to the weighted sum after it is calculated in a neuron?
211
**Back:** Then
this weighted sum is converted by another function σ(z) into the output of the neuron.
212
213
**Flashcard 43**
214
215
**Front:** What is the function σ(z) called?
216
**Back:** The function σ(z) is called the “activation function”.
217
218
**Flashcard 44**
219
220
**Front:** Describe the basic operation of an artificial neuron.
221
**Back:** So
basically
222
223
**Flashcard 45**
224
225
**Front:** What is the Heaviside function and what is it based on?
226
**Back:** The activation function can be based on the threshold (“Heaviside” function): σ(z) = 0 if z ≤ threshold σ(z) = 1 if z > threshold where the threshold is a real number
a parameter of the neuron.
227
228
**Flashcard 46**
229
230
**Front:** What is a "perceptron"?
231
**Back:** A simple type of artificial neuron which takes one or several binary inputs (with values 0 or 1) and has a threshold-based activation function is called a “perceptron”.
232
233
**Flashcard 47**
234
235
**Front:** Are non-linear activation functions typically used in artificial neurons?
236
**Back:** They are usually non-linear functions (the reason for this will be explained later) so that an artificial neuron transforms its inputs by a linear weighted sum and a non-linear activation function.
237
238
**Flashcard 48**
239
240
**Front:** Define the sigmoid function.
241
**Back:** sigmoid(x) = 1 / (1 + e−x).
242
243
**Flashcard 49**
244
245
**Front:** What is the output range of the sigmoid function?
246
**Back:** It converts its input into an output in the range from 0 to 1.
247
248
**Flashcard 50**
249
250
**Front:** Define the hyperbolic tangent function.
251
**Back:** tanh(x) = (e2x − 1) / (e2x + 1).
252
253
**Flashcard 51**
254
255
**Front:** What is the output range of the hyperbolic tangent function?
256
**Back:** It converts the input into an output in the range from -1 to 1.
257
258
**Flashcard 52**
259
260
**Front:** Define the Rectified Linear Unit (ReLU) function.
261
**Back:** reLU(x) = max(0
x).
262
263
**Flashcard 53**
264
265
**Front:** Describe how the ReLU function transforms inputs.
266
**Back:** It is basically a linear transformation for inputs greater than zero
while inputs below zero are transformed to zero. The output range is from 0 to ∞.
267
268
**Flashcard 54**
269
270
**Front:** Define the Softmax function for multiple inputs xi.
271
**Back:** softmax(xi) = exi / ∑ exi.
272
273
**Flashcard 55**
274
275
**Front:** What is the output range of the Softmax function?
276
**Back:** Its output range is from 0 to 1.
277
278
**Flashcard 56**
279
280
**Front:** For what purpose is the Softmax function convenient in modelling?
281
**Back:** ...it is very convenient for modelling probabilities of different classes xi.
282
283
**Flashcard 57**
284
285
**Front:** Where are Softmax functions used in neural machine translation models?
286
**Back:** ...a softmax function taking into account all target words in order to decide which of them has the highest probability.
287
288
**Flashcard 58**
289
290
**Front:** What happens when many neurons are connected together?
291
**Back:** When many neurons are connected
these operations become a powerful tool.
292
293
**Flashcard 59**
294
295
**Front:** What is a feed-forward network sometimes called
and under what condition is this name argued to be appropriate?
296
**Back:** This type of network is sometimes called multilayer perceptron
although it is argued that the name should be used only if its neurons are actually perceptrons (neurons with a threshold activation function).
297
298
**Flashcard 60**
299
300
**Front:** What is the input layer in a neural network?
301
**Back:** The input layer consists of (one or more) input neurons. Inputs of this layer are inputs to the entire neural network. The input layer receives the inputs
performs the calculations in its neurons and transmits the output to the subsequent layer. Each neural network *must* have an input layer.
302
303
**Flashcard 61**
304
305
**Front:** What is the output layer in a neural network?
306
**Back:** The output layer consists of (one or more) output neurons. The output layer receives its input from the previous layer. Outputs of this layer repre-sent the outputs of the entire network. The output layer is responsible for producing the final result by performing calculations in its neurons. Each neural network *must* have an output layer.
307
308
**Flashcard 62**
309
310
**Front:** What is a hidden layer in a neural network?
311
**Back:** The hidden layer is in the middle and connects the input and output layer. The word “hidden” implies that they are not visible from outside the network.
312
313
**Flashcard 63**
314
315
**Front:** How many hidden layers can a neural network have?
316
**Back:** A neural network can have an arbitrary number of hidden layers
from zero to many.
317
318
**Flashcard 64**
319
320
**Front:** What is a "deep neural network"?
321
**Back:** If a neural network has more than one hidden layer
it is called a “deep neural network”.
322
323
**Flashcard 65**
324
325
**Front:** What is "deep learning"?
326
**Back:** If such a neural network (more than one hidden layer) is used for machine learning
it is called “deep learning”.
327
328
**Flashcard 66**
329
330
**Front:** What is learned by the first hidden layer in a deep neural network?
331
**Back:** In a multi-layer (“deep”) neural network
the first hidden layer is able to learn some relatively simple patterns.
332
333
**Flashcard 67**
334
335
**Front:** What is learned by each additional hidden layer in a deep neural network?
336
**Back:** ...each additional hidden layer is able to learn progressively more complicated patterns.
337
338
**Flashcard 68**
339
340
**Front:** What is a theoretical capability of neural networks according to the "Universal Approximation Theorem"?
341
**Back:** The “Universal Approximation Theorem” states that a neural network with one hidden layer can approximate any continu-ous function for inputs within a specific range.
342
343
**Flashcard 69**
344
345
**Front:** Are there strict rules for building neural networks?
346
**Back:** Knowing that there are no strict rules for building neural networks and there are many possibilities to arrange the neurons and define their func-tions
you should be better able to imagine that neural networks really can model practically any function.
347
348
**Flashcard 70**
349
350
**Front:** Is it necessary to use the same activation function for all neurons in a network?
351
**Back:** ...it is also not necessary to use the same activation function for all neurons in a network! Usually
all neurons in one layer have the same activa-tion function
352
353
**Flashcard 71**
354
355
**Front:** Why are the important activation functions mentioned in the text non-linear?
356
**Back:** If all neurons in a network have linear activation functions
no matter how many layers we have
357
358
**Flashcard 72**
359
360
**Front:** How do recurrent neural networks differ from feed-forward neural networks in terms of information flow?
361
**Back:** In recurrent neural networks
outputs of some neurons do not pass further to the neurons in the subsequent layer but return to the same neuron as its input.
362
363
**Flashcard 73**
364
365
**Front:** For a feed-forward network with one layer
how are the dependencies between layers formulated?
366
**Back:** * H = F (X)
meaning that the values in the hidden layer are a function of the values in the input layer. * Y = F (H)
367
368
**Flashcard 74**
369
370
**Front:** How are the dependencies defined for a recurrent neural network?
371
**Back:** Hn = F (Xn
Hn−1) where n refers to the current position (“time frame”) in a sequence. This means that the current values (at position n) in the hidden layer Hn are not dependent only on the current values of the input layer Xn (as in feed-forward networks)
372
373
**Flashcard 75**
374
375
**Front:** What is the structure of RNNs well-suited for modelling?
376
**Back:** The structure of RNNs is well suitable for mod-elling of sequences.
377
378
**Flashcard 76**
379
380
**Front:** What does the output of an RNN at a given time step/position depend on?
381
**Back:** In total
the output depends not only on the current input at the current time step/position Xt
382
383
**Flashcard 77**
384
385
**Front:** What is a prominent type of network architecture used in Natural Language Processing nowadays?
386
**Back:** Nowadays
almost everything in Natural Language Processing is based on so-called “attention networks” (which include the most modern transformer architecture).
387
388
**Flashcard 78**
389
390
**Front:** What do attention networks represent?
391
**Back:** They are complex networks which represent how different inputs relate to different outputs.
392
393
**Flashcard 79**
394
395
**Front:** Were neural networks used for machine translation mentioned as being simple?
396
**Back:** It should be noted that neural networks used for machine translation are very large and complex
involving a large number of neurons organised in many layers
397
398
**Flashcard 80**
399
400
**Front:** What is characteristic of a feed-forward neural network regarding the direction of the input signal?
401
**Back:** It is called “feed-forward” because the input signal is always going forward
from the input layer through the hidden layer(s) to the output layer.