deep neural network Flashcards
(28 cards)
What defines a deep neural network (DNN)?
A neural network with more than one hidden layer.
What is the main difference between shallow and deep networks?
Deep networks stack multiple nonlinear layers to build complex representations.
What is the key operation performed by each layer in a DNN?
A linear transformation followed by a nonlinear activation.
What is the formula for the pre-activation in a layer?
z = W·h + b
What does the activation function do in each layer?
Applies nonlinearity to the pre-activation output.
What is the final output of a deep network?
The result after all layers have transformed the input.
How can you mathematically express a DNN’s output?
ŷ = f_L(…f_2(f_1(x)))
Why do we stack layers in a DNN?
To learn hierarchical and compositional features.
What do early layers typically learn in deep networks?
Low-level features (e.g., edges or color blobs).
What do deeper layers typically learn?
High-level patterns (e.g., object parts or abstract concepts).
What is the geometric intuition of depth in DNNs?
Each layer folds space, allowing complex reshaping of input regions.
What is the benefit of composing multiple shallow functions?
It enables the model to build complex mappings from simple components.
What happens to input space with each ReLU in a deep network?
It gets partitioned into more piecewise linear regions.
What is the effect of depth on model expressiveness?
It allows exponential growth in the number of linear regions.
What is an activation region in a DNN?
A subspace where a specific set of ReLU units are active.
How does parameter count compare between deep and shallow networks for similar region complexity?
Deep networks may use fewer parameters to achieve the same complexity.
What is one advantage of deep networks over shallow ones?
They can learn more abstract and reusable features with fewer units.
What is the typical architecture of a DNN layer?
Linear → Activation → Output → Next layer input
What hyperparameter controls the number of layers in a DNN?
The depth (number of hidden layers).
What hyperparameter controls the size of each layer?
The number of hidden units (width).
What activation function is most commonly used in DNNs?
ReLU (Rectified Linear Unit).
What is the universal approximation theorem?
A shallow network can approximate any continuous function with enough units.
Why do we still prefer deep networks despite the universal approximation theorem?
Deep networks can approximate functions more efficiently and with better generalisation.
What does each layer in a DNN build upon?
The representation produced by the previous layer.