Theory Flashcards
(103 cards)
What is the main difference between deep learning and machine learning and?
The main difference between deep learning and machine learning lies in their structure and complexity:
- Machine Learning (ML): It involves algorithms that can learn from data, identify patterns, and make decisions with minimal human intervention. Traditional ML methods, such as decision trees, support vector machines, and linear regression, typically require human expertise to extract relevant features from data.
- Deep Learning (DL): A subset of ML, deep learning uses neural networks with many layers (hence “deep”) to automatically learn features and representations from large amounts of data. It requires less human intervention for feature extraction because it can automatically discover patterns from raw data. Deep learning excels at complex tasks like image recognition, natural language processing, and speech recognition.
In short, machine learning often requires manual feature extraction, while deep learning automates this through multiple layers of abstraction in neural networks.
How the biological way neurons were mapped into artificial neurons?
The concept of artificial neurons in deep learning was inspired by the way biological neurons work, though the mapping is simplified for computational purposes. Here’s a basic comparison of the two:
- Structure: A biological neuron consists of dendrites (inputs), a cell body, and an axon (output).
- Process: Neurons receive electrical signals from other neurons through dendrites. If the cumulative input reaches a certain threshold, the neuron “fires,” sending an electrical signal through the axon to the next neuron.
- Communication: Neurons communicate via synapses, where chemical signals are transferred from one neuron to another. Learning occurs by adjusting the strength of these connections (synaptic weights).
- Structure: An artificial neuron is designed to simulate this process with three components—inputs (like dendrites), weights (like synapses), and an output.
- Process: Inputs (numerical values) are multiplied by corresponding weights and summed up. This weighted sum is passed through an activation function (similar to the biological “firing” threshold), which determines if the neuron should “activate” (fire) or not.
- Learning: The strength of connections (weights) between artificial neurons is adjusted during training using algorithms like backpropagation to minimize the error between predicted and actual outputs.
- Dendrites (inputs) → Multiple input features (data points).
- Synapses (connections) → Weights that control the importance of each input.
- Neuron firing → Activation function that decides if the neuron produces an output.
- Learning through synaptic changes → Adjusting weights based on error feedback during training.
While biological neurons are much more complex, the simplified model of artificial neurons allows for practical use in computations, forming the basis of neural networks in deep learning.
What happened with the original formula hj(sum(wi+xi)-b))?
In artificial neural networks, the bias is typically transformed into a weight-times-input representation (i.e., a constant value treated as an additional input with its own weight) for the following reasons:
The bias in a neuron allows the model to shift the activation function. Mathematically, a neuron computes the following:
[
z = w_1 x_1 + w_2 x_2 + \dots + w_n x_n + b
]
where ( w ) are the weights, ( x ) are the inputs, and ( b ) is the bias. By treating the bias as a weight connected to a constant input (usually set to 1), the equation becomes:
[
z = w_1 x_1 + w_2 x_2 + \dots + w_n x_n + w_0 \cdot 1
]
Here, the bias is represented by ( w_0 ), a weight multiplied by a constant input of 1. This form unifies the mathematical structure, making it easier to handle during computation.
In practice, deep learning frameworks use matrix multiplication to handle computations efficiently. By transforming the bias into a weight with a constant input of 1, it allows the entire model’s calculations to be performed as a single matrix operation. This avoids the need for separate handling of the bias term.
For example, without this transformation, we would need two separate operations: one for matrix multiplication of the inputs and weights, and one for adding the bias term. Combining them simplifies implementation and speeds up training.
Conceptually, the bias can be viewed as controlling the threshold for activation. It shifts the decision boundary (for classification tasks) or the output (for regression tasks). By representing the bias as a weight with a fixed input, it allows the network to learn both the importance (via weights) and the threshold (via the bias) in a consistent manner.
The bias provides flexibility to the model by allowing neurons to activate even when all inputs are zero. Without a bias term, the activation would be strictly dependent on the inputs and their corresponding weights, limiting the network’s ability to learn optimal patterns.
In summary, transforming the bias into a weight-times-input form allows for computational efficiency, mathematical consistency, and flexibility in training, making it easier to implement neural networks in practice.
In counting weights do we consider the bias?
Yes
What is the difference between the step and the design functions?
Step: 0 for neg, 1 for positive
Design: -1 for neg, 1 for positive
How do we build a OR gate using a neural network and the design function?
Building an OR gate using a neural network involves designing a single-layer perceptron (a type of artificial neuron) with appropriate weights and bias. Here’s a step-by-step guide:
The OR gate has two inputs, (x_1) and (x_2), and one output. The output is 1 if either or both of the inputs are 1; otherwise, it’s 0. The truth table for the OR gate looks like this:
A perceptron makes decisions based on the weighted sum of its inputs and a bias, which is passed through an activation function. For an OR gate, the output is 1 when the weighted sum exceeds a certain threshold (which is handled by the activation function).
The equation for a perceptron with two inputs is:
[
z = w_1 x_1 + w_2 x_2 + b
]
where:
- ( w_1 ) and ( w_2 ) are the weights for the inputs ( x_1 ) and ( x_2 ),
- ( b ) is the bias,
- ( z ) is the weighted sum, which will be passed through the activation function.
For a perceptron, a step function (Heaviside function) is typically used as the activation function. The step function outputs 1 if the input (z) is greater than or equal to 0, and outputs 0 otherwise:
[
\text{Activation}(z) =
\begin{cases}
1 & \text{if } z \geq 0
0 & \text{if } z < 0
\end{cases}
]
To implement an OR gate, we need to choose weights and a bias that make the perceptron behave like the OR logic. We want the perceptron to output 1 if either (x_1) or (x_2) is 1, and 0 if both inputs are 0.
By trial and error or simple logic, we can determine that:
- Set (w_1 = 1),
- Set (w_2 = 1),
- Set (b = -0.5).
This means the perceptron will fire (output 1) when the sum (x_1 + x_2) is greater than or equal to 0.5, which matches the truth table of the OR gate.
Now, let’s verify the behavior of the perceptron for the four possible input combinations:
-
Case 1: (x_1 = 0), (x_2 = 0)[
z = (1 \cdot 0) + (1 \cdot 0) + (-0.5) = -0.5
]Since (z < 0), the output is 0, which matches the OR gate output. -
Case 2: (x_1 = 0), (x_2 = 1)[
z = (1 \cdot 0) + (1 \cdot 1) + (-0.5) = 0.5
]Since (z \geq 0), the output is 1, which matches the OR gate output. -
Case 3: (x_1 = 1), (x_2 = 0)[
z = (1 \cdot 1) + (1 \cdot 0) + (-0.5) = 0.5
]Since (z \geq 0), the output is 1, which matches the OR gate output. -
Case 4: (x_1 = 1), (x_2 = 1)[
z = (1 \cdot 1) + (1 \cdot 1) + (-0.5) = 1.5
]Since (z \geq 0), the output is 1, which matches the OR gate output.
- Weights: (w_1 = 1), (w_2 = 1),
- Bias: (b = -0.5),
- Activation function: Step function.
This configuration successfully implements the OR gate using a neural network (specifically, a single-layer perceptron).
(x_1) | (x_2) | OR Output |
|———|———|———–|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 1 |
What does the Hebbian learning says? What are its rules? Describe the formulas.
“The strength of a synapse increases according to the simultaneous
activation of the relative input and the desired target”
Start from a random initialization. Fix the weights one sample at the time (online), and only if the sample is not correctly predicted
Implement a Hebbian Learning perceptron starting from w = [0, 0, 0], n = 0,5 to get an OR gate.
What is an epoch?
One pass through the data
What is the difference between a batch and an epoch?
Epoch is a pass through the data, a batch is the data used to fix the weights
If you choose the order of input data does the result change?
Yes
If you start from different weights will you get the same weights?
No
Does the procedure always converge?
Yes, if there is a solution
What is the math behind the perceptron process? Why a single perceptron can not build a XOR gate? How could we solve that?
Because there is no single line separating the different classifications of the results (1, -1). Add another layers of perceptrons
What is the relation between the math behind a perceptron process, the topology of the network and the type of decision region?
The perceptron is trying to define a line that separates the positive from the negative points
Why can’t we apply straightforward Hebbian learning to multilayer perceptron networks?
Why is the difference between perceptrons and feed forward neural networks?
The usage of continuous function, not a threshold function. What together with the guarantee that the signal travels into one direction and the continuous differential function in nodes possibilitate back propagation
What are the conditions to apply back propagation?
Signal travel only in one direction, and you have continuous differentiable function in the nodes.
What are FFNN? What does the input layer do in an FFNN?
Non-linear model characterized by the
number of neurons, activation functions, and the values of weights.
In a Feed Forward Neural Network (FFNN), the input layer is the first layer of the network and serves as the interface between the raw data and the rest of the network. Its primary role is to:
1. Receive Input Data:
• The input layer accepts the raw features of the data (e.g., pixel values of an image, numerical features, or text embeddings). Each feature corresponds to one node (neuron) in the input layer.
• The number of nodes in the input layer is equal to the number of features in the input data.
2. Pass Data to the Next Layer:
• The input layer doesn’t perform any computations or transformations on the data. It simply passes the input values to the subsequent layer (usually the first hidden layer) for further processing.
• Each value is passed to the next layer through weighted connections.
3. Structure the Input for the Network:
• The input layer ensures that the data is in the correct format and dimension for the neural network to process. For example, in a network designed for images, the input layer might flatten a 2D image into a 1D array.
Key Point:
• The input layer does not perform any activation function or transformation—its purpose is simply to provide a conduit for the raw data to enter the neural network.
What is the size of the output layer?
The size of the output layer in a Feed Forward Neural Network (FFNN) depends on the nature of the task and the type of data being predicted. Specifically:
- Single Value Prediction (Regression Tasks)
• Output Size: 1 neuron.
• Example: Predicting house prices, stock values, or other continuous numeric values.
• Reason: A single output neuron represents the continuous predicted value. - Binary Classification
• Output Size: 1 neuron.
• Example: Classifying whether an email is spam or not spam.
• Reason: The single neuron outputs a probability (usually between 0 and 1) after applying a sigmoid activation function. - Multi-Class Classification
• Output Size: Equal to the number of classes (n_classes).
• Example: If classifying images of digits (0–9), the output layer will have 10 neurons.
• Reason: Each neuron corresponds to one class, and the outputs represent class probabilities (often processed by a softmax activation function). - Multi-Label Classification
• Output Size: Equal to the number of labels (n_labels).
• Example: Predicting multiple attributes of an object, like weather conditions (e.g., sunny, windy, rainy).
• Reason: Each neuron represents whether a specific label is present (e.g., using sigmoid activations for probabilities of each label). - Custom Outputs (e.g., Vector Outputs)
• Output Size: Depends on the task-specific requirements.
• Example: Predicting embeddings (e.g., in NLP or recommendation systems) or generating multiple outputs (e.g., multi-task learning).
Summary
The size of the output layer is determined by:
• The type of task (regression, classification, etc.).
• The number of values or classes that need to be predicted.
What are the values of the output layer according to the activation function used for the output neuron?
In Regression the output spans the whole ℜ domain:
• Use a Linear activation function for the output neuron
In Classification with two classes, chose according to their coding:
• Two classes Ω0 = −1, Ω1 = +1 then use Tanh output activation
• Two classes Ω0 = 0, Ω1 = 1 then use Sigmoid output activation
(it can be interpreted as class posterior probability)
When dealing with multiple classes (K) use as many neuron as classes
• Classes are coded as Ω0 = 0 0 1 , Ω1 = 0 1 0 , Ω2 = [1 0 0]
• Output neurons use a softmax unit
When can we say a computer program can learn?
“A computer program is said to learn from experience E with respect to some class of task T and a performance measure P, if its performance at tasks in T, as measured by P, improves because of experience E.”
What are the machine learning paradigms?
Imagine you have a certain experience D, i.e., data, and let’s name it
𝐷 = 𝑥1, 𝑥2, 𝑥3, … , 𝑥𝑁
• Supervised learning: given the desired outputs 𝑡1, 𝑡2, 𝑡3, … , 𝑡𝑁 produce the correct output given a new set of input
• Unsupervised learning: exploit regularities in D to build a representation to be used for reasoning on prediction
• Reinforcement learning: producing actions 𝑎1, 𝑎2, 𝑎3, … , 𝑎𝑁 that affect the environment, and receiving rewards 𝑟 1, 𝑟 2, 𝑟 3, … , 𝑟 𝑁 learn to act in order to maximize rewards in the long term
Which conceptual difference does make Deep Learning differ significantly
from being just another paradigm of Machine Learning similarly to supervised
learning, unsupervised learning, reinforcement learning, etc.?
The key conceptual difference that makes Deep Learning stand out from traditional Machine Learning paradigms (like supervised learning, unsupervised learning, or reinforcement learning) lies in its focus on representation learning and its ability to automatically discover features from raw data.
Here are the key points of differentiation:
- Feature Engineering
• Traditional Machine Learning: Relies heavily on manual feature engineering, where domain experts identify and extract relevant features from the data to feed into the algorithm.
• Deep Learning: Learns hierarchical feature representations directly from raw data using deep neural networks. For example, in image recognition, it learns low-level features (edges, textures) in early layers and high-level abstractions (shapes, objects) in deeper layers. - Scalability with Data
• Traditional Machine Learning: Performance often plateaus as data increases. The quality of manually crafted features and simpler models limits scalability.
• Deep Learning: Excels in scenarios with large amounts of labeled data, as it can leverage its depth to improve performance with increased data. - End-to-End Learning
• Traditional Machine Learning: Often requires multiple stages, such as preprocessing, feature extraction, and applying a predictive model.
• Deep Learning: Provides end-to-end learning, where raw inputs are mapped directly to outputs without intermediate manual steps. - Representation of Complex Data
• Traditional Machine Learning: Struggles with unstructured data like images, audio, and text, often requiring domain-specific techniques to process such data.
• Deep Learning: Can model unstructured data effectively due to its ability to learn abstract representations, making it especially powerful in fields like natural language processing, computer vision, and speech recognition. - Architectural Depth
• Traditional Machine Learning: Models (e.g., linear regression, decision trees, support vector machines) generally have a shallower architecture, limiting their ability to capture complex patterns.
• Deep Learning: Utilizes deep neural networks with many layers, enabling it to model highly non-linear and intricate relationships in data. - Generalization Through Pretraining and Transfer Learning
• Deep learning allows for pretrained models that generalize across tasks (e.g., models like GPT or ResNet), a capability not common in traditional machine learning paradigms.
Thus, Deep Learning is not just another paradigm but a significant leap due to its ability to automatically learn hierarchical, data-driven representations, scale with data, and excel in domains requiring complex pattern recognition.