Fully-connected Neural Networks Flashcards

(14 cards)

1
Q

What is a feedforward (fully-connected) neural network?

A

A generalization of a single neuron: a sequence of layers where each node in layer L takes inputs from all nodes in layer L−1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why are non-linear activation functions necessary in neural networks?

A

Without non-linearity, stacked layers collapse into one linear transformation, so depth would add no representational power.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is layer computation expressed in matrix form for two layers?

A

Z¹ = B¹ + W¹ · Z⁰; Z² = B² + W² · Z¹, showing each layer applies an affine transform to the previous outputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you create a linear layer in PyTorch and with what parameter initialization?

A

Use torch.nn.Linear(in_features, out_features); by default weights are initialized from a normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you flatten MNIST images of shape (1,28,28) for a batch of size 32?

A

Apply tensor.flatten(start_dim=1) to get a tensor shape of (32, 784) before feeding into a linear layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is a single-layer network defined for 10-digit classification?

A

nn.Linear(in_features=784, out_features=10), as MNIST has 784 inputs and 10 output classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the shapes of weight and bias parameters in a linear layer?

A

Weights: (out_features, in_features); Biases: (out_features).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Provide the formulas for the sigmoid and tanh activation functions.

A

σ(z) = 1/(1+e^(-z)); tanh(z) = (e^z − e^{-z})/(e^z + e^{-z}).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why are sigmoid and tanh less used in hidden layers?

A

They saturate quickly (derivatives near zero), leading to vanishing gradients in deep networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the ReLU and Leaky ReLU activation formulas?

A

ReLU(x)=max(0,x); LReLU(x)=x if x≥0 else αx (α≈0.01).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you define a custom PyTorch Module composed of submodules?

A

Subclass nn.Module, initialize submodule layers in __init__, and define the forward pass chaining them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the softmax function formula for multi-class classification?

A

P(i)=e^{ŷ_i}/Σ_{c=1}^C e^{ŷ_c}, converting logits to probability distribution over C classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the cross-entropy loss formula shown in the notebook?

A

L = -log(e^{ŷ_j}/Σ_{c=1}^C e^{ŷ_c}), where j is the true class index.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you transfer model and data to a GPU in PyTorch?

A

Use tensor.to(device) or model.to(device) with device = torch.device(‘cuda’) to move them to GPU memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly