4 - Introduction to probability distributions Flashcards

Question 1

Q

What is the difference between supervised and unsupervised learning?

Answer

A

Supervised learning involves training a model on labeled data (input-output pairs), while unsupervised learning uses data without labels to find structure or patterns.

Question 2

Q

Define the bias-variance tradeoff in machine learning.

Answer

A

The bias-variance tradeoff is the balance between a model’s ability to generalize (low variance) and its accuracy on training data (low bias). High bias leads to underfitting, high variance to overfitting.

Question 3

Q

What is overfitting?

Answer

A

Overfitting happens when a model learns noise in the training data and performs poorly on unseen data.

Question 4

Q

What are decision trees?

Answer

A

Decision trees are predictive models that split data into branches to predict outcomes, using rules learned from features.

Question 5

Q

Define entropy in the context of decision trees.

Answer

A

Entropy measures the impurity or uncertainty in a dataset. In decision trees, it’s used to decide how to split the data.

Question 6

Q

What does a decision tree algorithm aim to do at each node?

Answer

A

It selects the attribute that provides the highest information gain to split the data.

Question 7

Q

What is the purpose of pruning in decision trees?

Answer

A

Pruning reduces the size of the decision tree by removing nodes that provide little predictive power, helping to prevent overfitting.

Question 8

Q

What are the two main types of pruning?

Answer

A

Pre-pruning (early stopping): Stop tree growth early based on criteria like max depth or minimum samples per node.
Post-pruning (reduced error pruning): Grow the full tree first, then remove nodes that do not improve performance on validation data.

Question 9

Q

What is the ID3 algorithm?

Answer

A

ID3 (Iterative Dichotomiser 3) builds decision trees by selecting attributes that maximize information gain at each node.

Question 10

Q

How does ID3 handle numeric attributes?

Answer

A

ID3 requires discretization of numeric attributes — converting them into categorical bins or thresholds.

Question 11

Q

What’s a major limitation of ID3?

Answer

A

It can be overfit on noisy data and does not handle continuous attributes or pruning natively.

4 - Introduction to probability distributions Flashcards

(11 cards)