week 4 - chatgpt Flashcards
(11 cards)
What limitation of single-layer perceptrons is addressed by multilayer neural networks?
Single-layer perceptrons cannot model nonlinear decision boundaries; multilayer networks can model arbitrary nonlinear functions.
What is the universal approximation theorem in neural networks?
A feedforward network with a single hidden layer can approximate any continuous function given sufficient neurons and proper weights.
Why are differentiable activation functions needed in multilayer neural networks?
Because backpropagation relies on gradient descent, which requires the derivative of the activation function to compute updates.
How does backpropagation compute the weight updates in a neural network?
By using the chain rule to propagate the error backward from the output layer through the hidden layers and updating weights using gradient descent.
What is the general update rule for weights using backpropagation?
w ← w − η * ∂J/∂w, where J is the cost function and η is the learning rate.
What is the error term for an output unit in backpropagation?
δ_k = (t_k − z_k) * f′(net_k), where t_k is the target and z_k is the output.
What is the error term for a hidden unit in backpropagation?
δ_j = f′(net_j) * sum over k of (w_kj * δ_k), where δ_k is the error of output neurons.
What is the main purpose of using stochastic (online) backpropagation?
To update weights after each training example, which often leads to faster convergence and better generalisation than batch updates.
What problem does early stopping help prevent during training?
It helps prevent overfitting by stopping training when validation error starts to increase, even if training error is still decreasing.
How is a Radial Basis Function (RBF) network different from a Multilayer Perceptron (MLP)?
An RBF network uses radial activation functions based on distance and typically has only one hidden layer, while an MLP uses layered linear combinations with nonlinear activations.
What are the two main training phases of an RBF network?
First, determine the centres of the basis functions (unsupervised), then learn output weights (supervised).