DL Fundamentals Flashcards

Question 1

Q

Representational learning

Answer

A

Engineering representations is hard - requires technical and domain expertise

Representational learning is a set of method that allows a machine to be fed with raw data and to automatically discover the representations needed for detection or classification (LeCun et al., 2014)

Question 2

Q

List some activation functions

Answer

A

g(z) = 1 / (1 + e^(-z))

Question 3

Q

Loss functions

Answer

A

Squared error

2. Log loss

Question 4

Q

Squared error loss function

Answer

A

1/2 * (M(d) - t)^2

Question 5

Q

Log loss function

Answer

A

-((t * log(M(d)) + (1-t)*(log(1-M(d)))

Question 6

Q

Difference between loss and cost function

Answer

A

Loss function = measure of the prediction error on a single training instance
Cost function = measure of the average prediction error across a set of training instances
Cost functions allows us to add in regularization

Question 7

Q

Gradient Descent Algorithm

Answer

A

Choose random weights
Until convergence
- Set all gradients to zero
- For each training instance
  - Calculate the model output
  - Calculate loss
  - Update gradient sum for for each weight and bias
- Update weights and bias using the weight update rule

Question 8

Q

Backpropagation Algorithm

Answer

A

Werbos (1974)
Not widely used until 1986

Initialize the weights to random small values (fanin)
FF phase - Feed input data through the network from the inputs to the outputs
Update the training error for the network based (based on target values for all output nodes)
Error propagation phase - Feed error values back through the network, adjusting the weights along the way
Repeat from 2 until the error values are sufficiently small or some other stopping condition

Question 9

Q

Forward pass algo

Answer

A

Require: L, network depth
Require: W[i], i is an element of {1…L}, weight matrices for each layer
Require: b[i], i is an element of {1…L}, bias terms for each . layer
Require: d, input descriptive features
Require: t, target features

a[0] = d
for I = 1 to L:
z[I] = W[I]*a[I-1] + b
a[I] = g[I]z[I]

M(d) = a[L]
Calculate L(M(d), t)

Question 10

Q

Backward Propagation algo

Answer

A

Require: A forward pass of network
Require: L, network depth
Require: W[i], i is an element of {1…L}, weight matrices for each layer
Require: b[i], i is an element of {1…L}, bias terms for each layer
Require: t, target features

Calculate da[L] #derivate of loss function
for I=L to 1:
   dz[I] = da[I] * g[I]'(z[I])
   dW[I] = dz[I] * a[I-1]T
   db[I] = dz[I]
   da[I-1] = W[I]T*dz[I]

Question 11

Q

Stochastic/Online GD

Answer

A

Choose random weights
Until convergence
- Shuffle all training instances
- For each training instance:
  - Perform f/w pass
  - Perform b/w pass
  - Update weights and biases using update rule

Question 12

Q

Stochastic GD Update Rule

Answer

A

W[i] = W[i] - αdW[i]
b[i] = b[i] - αdb[i]

Question 13

Q

GD Batch

Answer

A

Choose random weights
Until convergence
- Set all gradient sums to 0
- For each training instance:
  - Perform f/w pass
  - Perform b/w pass
  - Update gradient sum for each weight and bias term
- Update weights and biases using update rule

Question 14

Q

Batch GD Update rule

Answer

A

W[i] = W[i] - α(1/m)Σ(from j=0 to M) dW[i]j
b[i] = b[i] - α(1/m)Σ(from j=0 to M)db[i]j

Question 15

Q

GD Mini-batch

Answer

A

Choose random weights
Until convergence
- Divide the training set into mini-batches of size s
- For each mini-batch D(mb)
  - Set all gradient sums = 0
  - For each training instance in D(mb)
    - Perform f/w pass
    - Perform b/w pass
    - Update gradient sum for each weight and bias t
  - Update weight and bias term using update rule

Question 16

Q

GD Mini-batch update rule

Answer

Study These Flashcards

A

W[i] = W[i] - α(1/s)Σ(from j=0 to S) dW[i]j
b[i] = b[i] - α(1/s)Σ(from j=0 to S)db[i]j

Question 17

Q

Stochastic GD - Advantages

Answer

Study These Flashcards

A

Easy to implement

- Fast learning

Question 18

Q

Stochastic GD - Disadvantages

Answer

Study These Flashcards

A

Noisy gradient signal

- Computationally expensive

Question 19

Q

Batch GD - Advantages

Answer

Study These Flashcards

A

Computationally efficient

- Stable gradient signal

Question 20

Q

Batch GD - Disadvantages

Answer

Study These Flashcards

A

Requires gradient accumulation
Premature convergence
Involves loading large datasets into memory
Thus can be slow

Question 21

Q

Mini-batch GD - Advantages

Answer

Study These Flashcards

A

Relatively computationally efficient
Does not require full datasets to be loaded into memory
Stable gradient signal

Question 22

Q

Mini-batch GD - Disadvantages

Answer

Study These Flashcards

A

Gradient accumulation

- Another hyper-parameter - minibatch size

Question 23

Q

Talk abut representation learning in the context of classification tasks

Answer

Study These Flashcards

A

Higher layers of representation amplifies aspects of the input that are important for discrimination and suppress irrelevant variations

Hinton nature paper: key aspect of deep learning is that layers of features are not designed by human engineers, they are learned from data using general-purpose learning procedures

DL Fundamentals Flashcards

(23 cards)