lecture 5: backpropagation Flashcards
single-layer perceptrons
- are limited to linearly separable problems
- we need to add layers to make universal function approximators
how to find the weights of a multilayer perceptron
with backpropagation
What is the main purpose of backpropagation?
Backpropagation calculates and propagates errors backward through the network to adjust weights, enabling the network to learn by minimizing error.
backpropagation steps
- forward sweep
- compare predicted output to true output
- compute the error term
- update the weight from output to hidden layer
- update the weights of deeper layers
What is forward propagation in a neural network?
- passing input data 𝑥 through the network to compute the output 𝑦 via the intermediate hidden layer activations ℎ.
How is information flow represented mathematically in forward propagation?
- the flow is x→h→y
- Hidden layer activations: h=W^{hx}x
- Output: y=W^{yh}h
What is the first step in backpropagation after the forward sweep?
Compare the predicted output y to the target t to calculate the error in the output layer.
How is the error δ for the output layer calculated?
- δ=g′(a_j)⋅(t_j−y_j)
- output error = derivative of the activation * difference between target and predicted output
How are weights connected to the output layer updated in backpropagation?
- Δw_jk =−ϵ⋅δ⋅h_k
- learning rate * error term * input to the weight
How does backpropagation work for the hidden layers?
- propagates the error from the output layer back to the hidden layers
- δ_i=g′(a_i)⋅δ_j
What are the two key steps in backpropagation?
- Compute the error term δ for each layer.
- Update the weights using the error term and the learning rate.
neural network architectures
- recursive neural network
- convolutional neural network
RNNs
- time as a factor: memory, sequence analysis, temporal predictions, language
- feed their outputs back into themselves, time-step by time-step
RNNs: unrolling
expanding the RNN over time steps, treating each time step as a layer in the network for backpropagation through time (BPTT)
RNNs: feedback loop
lets RNNs maintain a record of history of past inputs, and integrate information over time
RNNs: learning
- backpropagation through time (BPTT)
- backpropagation is extended over time to compute gradients for all time steps in the sequence.
- this allows RNNs to learn how earlier time steps influence later ones.
RNNs: pros
- can learn all sorts of sequences, even for remote events in sequences.
- “remembering” information from earlier in the sequence allows them to learn long-term patterns
- dynamic, semantic information processing allows for speech recognition and language modeling
RNNs: cons
- Recursion Issues: Feedback loops can lead to numerical instability and training difficulties.
- Scaling Challenges: Long sequences create many time steps, leading to many layers, a large number of parameters, and vanishing gradients
How is the problem of many time points solved in RNNs
LSTM RNNs
general problems with making networks bigger
- larger number of parameters
- vanishing gradients
problem with large number of parameters
- take long to train
- many local minima
- very sensitive to biases in training set
what are vanishing gradients
- sigmoid is limited to [0,1]
- Error gets diluted across many layers, becoming smaller and smaller
- Lower layers therefore get only tiny weight changes, so they learn very slowly
convolutional neural networks
- convolution helps with increasing the number of layers without exploding the number of parameters
- great for classification tasks: image/sound recognition
training CNNs
standard backpropagation