Machine Learning & AI Flashcards

Question 1

Q

What is the difference between supervised and unsupervised learning?

Answer

A

Supervised Learning: Uses labeled data to learn input-output mapping. Examples: classification, regression. Algorithms: decision trees, SVM, neural networks.
Unsupervised Learning: Uses unlabeled data to find patterns. Examples: clustering, dimensionality reduction. Algorithms: K-means, hierarchical clustering, PCA.

Question 2

Q

Explain neural networks and deep learning.

Answer

A

Neural networks: Layers of interconnected neurons. Deep learning: Uses multiple hidden layers.
Forward propagation: Input to output
Backpropagation: Error backflow to update weights
Activation functions: ReLU, sigmoid, tanh
Applications: Image recognition, NLP, speech recognition.

Question 3

Q

What is overfitting and how do you prevent it?

Answer

A

Overfitting: Model learns noise, poor generalization.
Prevention: Cross-validation, Regularization (L1, L2), Early stopping, Dropout, Data augmentation, Ensemble methods.

Question 4

Q

What is the difference between supervised and unsupervised learning?

Answer

A

Supervised Learning:
- The model learns from labeled data (i.e., data with known outcomes).
- Goal: To learn a mapping function that can predict the output for new, unseen data.
- Examples: Classification (predicting a category, e.g., spam vs. not spam) and Regression (predicting a continuous value, e.g., house price).

Unsupervised Learning:
- The model learns from unlabeled data, trying to find hidden patterns or intrinsic structures.
- Goal: To explore the data and find meaningful insights.
- Examples: Clustering (grouping similar data points, e.g., customer segmentation) and Dimensionality Reduction (reducing the number of variables, e.g., PCA).

Question 5

Q

What is reinforcement learning?

Answer

A

A type of machine learning where an ‘agent’ learns to make decisions by performing actions in an ‘environment’ to maximize a cumulative ‘reward’.
- The agent learns through trial and error.
- It is not given explicit instructions but receives feedback (rewards or penalties) for its actions.
- Example: Training a model to play a game like chess, where the reward is winning.

Question 6

Q

Explain neural networks and deep learning.

Answer

A

Neural Network (NN):
- A computational model inspired by the structure and function of the human brain.
- Composed of interconnected nodes (neurons) organized in layers: an input layer, one or more hidden layers, and an output layer.
- Each connection has a weight that is adjusted during training.

Deep Learning (DL):
- A subfield of machine learning based on neural networks with many hidden layers (hence ‘deep’).
- Capable of learning very complex patterns from large amounts of data.
- The ‘deep’ architecture allows for learning a hierarchy of features automatically.

Question 7

Q

What is an activation function and why is it important?

Answer

A

Definition: A function applied to the output of a neuron that determines if it should be activated (i.e., what signal it passes on).
Importance: Activation functions introduce non-linearity into the network. Without them, a neural network, no matter how many layers it has, would behave like a simple linear regression model and would be unable to learn complex patterns.

Question 8

Q

What is backpropagation?

Answer

A

The primary algorithm for training neural networks. It works in two passes:
1. Forward Pass: Input data is fed through the network to produce an output, and the error (difference between predicted and actual output) is calculated.
2. Backward Pass: The error is propagated backward through the network, from the output layer to the input layer. The algorithm calculates the gradient of the error with respect to each weight, and the weights are adjusted to minimize the error.

Question 9

Q

What is overfitting and how do you prevent it?

Answer

A

Overfitting: A modeling error where a model learns the training data too well, including its noise and random fluctuations. As a result, the model performs poorly on new, unseen data.
Prevention Techniques:
1. Get More Data: A larger, more diverse dataset helps the model generalize better.
2. Regularization: Adds a penalty term to the loss function to discourage overly complex models (e.g., L1 and L2 regularization).
3. Dropout: During training, randomly sets a fraction of neuron activations to zero in each iteration, forcing the network to be less reliant on any single neuron.
4. Cross-Validation: Splits the data into multiple folds to train and test the model, ensuring it performs well across different subsets of data.
5. Early Stopping: Monitors the model’s performance on a validation set and stops training when performance begins to degrade.

Question 10

Q

Explain the bias-variance tradeoff.

Answer

A

A fundamental concept in machine learning that describes the tradeoff between two sources of error that prevent models from generalizing perfectly.
Bias: The error from incorrect assumptions in the learning algorithm. High bias can cause a model to miss relevant relations between features and outputs (underfitting).
Variance: The error from sensitivity to small fluctuations in the training set. High variance can cause a model to capture noise from the training data (overfitting).
Tradeoff: Increasing a model’s complexity typically decreases bias but increases variance. The goal is to find a balance that minimizes the total error.

Question 11

Q

What are Precision and Recall, and when would you use one over the other?

Answer

A

Both are metrics for classification tasks.
Precision: Of all the positive predictions made, how many were actually correct? (Precision = TP / (TP + FP)). Use when the cost of a false positive is high (e.g., a spam filter marking an important email as spam).
Recall (Sensitivity): Of all the actual positive cases, how many did the model correctly identify? (Recall = TP / (TP + FN)). Use when the cost of a false negative is high (e.g., a medical test failing to detect a disease).

Question 12

Q

Briefly explain how a Decision Tree works.

Answer

A

A supervised learning algorithm that works like a flowchart.
- It splits the data into smaller and smaller subsets based on the values of its features.
- At each node, it selects the feature that best splits the data (e.g., using Gini impurity or information gain).
- The process continues until it reaches ‘leaf’ nodes, which represent the final classification or regression value.

Question 13

Q

What is K-Means Clustering?

Answer

A

An unsupervised learning algorithm used to partition data into ‘K’ distinct, non-overlapping clusters.
How it works:
1. Initialize K centroids randomly.
2. Assign each data point to the nearest centroid.
3. Recalculate the centroids as the mean of all data points assigned to them.
4. Repeat steps 2-3 until the centroids no longer move significantly.

Machine Learning & AI Flashcards

(13 cards)