L8 Flashcards
(26 cards)
What is a random variable?
A variable representing an uncertain outcome (e.g., will it rain tomorrow?).
Has a domain of possible values (e.g., yes/no, 1-6 for a die, etc.).
Examples include yes/no questions or numerical outcomes like rolling a die.
What is the range of basic probability notation?
0 ≤ P(A) ≤ 1→ always between 0 and 1.
P(true)=1, P(false)=0
Union of two events:
P(A∪B) = P(A) + P(B) − P(A∩B) → set theory
P(true) = 1, P(false) = 0
What is the formula for the union of two events?
P(A∪B) = P(A) + P(B) − P(A∩B)
This is based on set theory.
Define conditional probability.
Probability of one event given another: P(A|B) = P(A∩B) / P(B)
p(slept in movie) = 0.5
p(slept in movie | liked movie) = ¼
p(didn’t sleep in movie | liked movie) = 3/4
Example: p(slept in movie | liked movie) = ¼
What is joint probability?
Probability that two things happen at the same time
Example: P(Slept,Liked movie)
Example: P(Slept, Liked movie)
What is the Chain Rule of Probability?
Links joint and conditional probabilities: p(x,y) = p(x|y) p(y) = p(y|x)p(x)
p(y|x)= p(x|y) p(y)/p(x)
Also includes p(y|x) = p(x|y) p(y) / p(x)
What is Bayes’ Rule used for?
Helps classify by updating beliefs based on new evidence
It is fundamental in machine learning.
What does a Naive Bayes Classifier assume?
All features are independent given the class
This is why it is called ‘naive’.
How does the Naive Bayes Classifier work?
- Compute prior probabilities of classes: P(class)
- Compute likelihoods for each feature: P(feature∣class)
- Use Bayes Rule to find P(class∣features)
- Choose class with highest probability.
Example: P(Larry) = 28/80
What type of data does Gaussian Naive Bayes handle?
Continuous data
- Assumes values follow a normal distribution.
Examples include height and weight.
What is Multinomial Naive Bayes used for?
Count data
Example: Frequency of word appearances.
What is Bernoulli Naive Bayes used for?
Binary features
- Used when features are true/false type.
Used for true/false type features.
List the pros of Naive Bayes.
Simple, fast, works with small data
Works well for high-dimensional data (e.g., text)
Works well with small data.
List the cons of Naive Bayes.
Assumes features are independent
Can give bad estimates if some classes are underrepresented
Can give bad estimates if some classes are underrepresented.
What does a decision tree model do?
Splits data based on feature conditions
Each internal node = feature test
Each leaf = predicted class (output)
Example: Is age > 25?
What does each internal node in a decision tree represent?
A feature test
Each leaf represents a predicted class.
What is the process for building a decision tree?
Recursively split data using the best feature (based on some criteria).
Continue until data is “pure” or other stopping conditions are met.
Decision Tree grows with each level of questions
Each node (box) of the decision tree tests a condition on a feature. “Questions” are thresholds on single features.
All decision boundaries are perpendicular to the feature axes, because at each node a decision is made about a single feature
Continue until data is pure or stopping conditions are met.
What are the three splitting criteria in decision trees?
- Gini Index - Measures impurity → Lower = purer
- Entropy - level of uncertainty
- Information Gain
Information Gain: IG = H(Y) - H(Y|X)
What does conditional entropy measure?
The uncertainty of a random variable given another variable
H(X, Y) = H(X|Y) + H(Y)
How does model complexity in decision trees relate to depth?
complexity of the model induced by a decision tree is determined by the depth of the tree
Increasing the depth of the tree increases the number of decision boundaries and may lead to overfitting
Pre-pruning and post-pruning
Limit tree size (pick one): max_depth / max_leaf_nodes / min_samples_split (i nand more)
Too deep may lead to overfitting.
What is pre-pruning in decision trees?
Limit depth or number of leaves during training
Can use parameters like max_depth or min_samples_split.
What is post-pruning?
Cutting back after the tree is grown
Helps to avoid overfitting.
What is the criterion for regression with decision trees?
Mean Squared Error (MSE) at each node
Used for predicting numerical values.
List the advantages of decision trees.
Easy to interpret
Handles mixed data types
Fast to train
They provide clear decision paths.