4 - Introduction to probability distributions Flashcards
(11 cards)
What is the difference between supervised and unsupervised learning?
Supervised learning involves training a model on labeled data (input-output pairs), while unsupervised learning uses data without labels to find structure or patterns.
Define the bias-variance tradeoff in machine learning.
The bias-variance tradeoff is the balance between a model’s ability to generalize (low variance) and its accuracy on training data (low bias). High bias leads to underfitting, high variance to overfitting.
What is overfitting?
Overfitting happens when a model learns noise in the training data and performs poorly on unseen data.
What are decision trees?
Decision trees are predictive models that split data into branches to predict outcomes, using rules learned from features.
Define entropy in the context of decision trees.
Entropy measures the impurity or uncertainty in a dataset. In decision trees, it’s used to decide how to split the data.
What does a decision tree algorithm aim to do at each node?
It selects the attribute that provides the highest information gain to split the data.
What is the purpose of pruning in decision trees?
Pruning reduces the size of the decision tree by removing nodes that provide little predictive power, helping to prevent overfitting.
What are the two main types of pruning?
Pre-pruning (early stopping): Stop tree growth early based on criteria like max depth or minimum samples per node.
Post-pruning (reduced error pruning): Grow the full tree first, then remove nodes that do not improve performance on validation data.
What is the ID3 algorithm?
ID3 (Iterative Dichotomiser 3) builds decision trees by selecting attributes that maximize information gain at each node.
How does ID3 handle numeric attributes?
ID3 requires discretization of numeric attributes — converting them into categorical bins or thresholds.
What’s a major limitation of ID3?
It can be overfit on noisy data and does not handle continuous attributes or pruning natively.