L8 Flashcards by jolyn Unknown

What is a random variable?

A variable representing an uncertain outcome (e.g., will it rain tomorrow?).
Has a domain of possible values (e.g., yes/no, 1-6 for a die, etc.).

Examples include yes/no questions or numerical outcomes like rolling a die.

How well did you know this?

Not at all

Perfectly

What is the range of basic probability notation?

0 ≤ P(A) ≤ 1→ always between 0 and 1.
P(true)=1, P(false)=0
Union of two events:
P(A∪B) = P(A) + P(B) − P(A∩B) → set theory

P(true) = 1, P(false) = 0

How well did you know this?

Not at all

Perfectly

What is the formula for the union of two events?

P(A∪B) = P(A) + P(B) − P(A∩B)

This is based on set theory.

How well did you know this?

Not at all

Perfectly

Define conditional probability.

Probability of one event given another: P(A|B) = P(A∩B) / P(B)

p(slept in movie) = 0.5
p(slept in movie | liked movie) = ¼
p(didn’t sleep in movie | liked movie) = 3/4

Example: p(slept in movie | liked movie) = ¼

How well did you know this?

Not at all

Perfectly

What is joint probability?

Probability that two things happen at the same time

Example: P(Slept,Liked movie)

Example: P(Slept, Liked movie)

How well did you know this?

Not at all

Perfectly

What is the Chain Rule of Probability?

Links joint and conditional probabilities: p(x,y) = p(x|y) p(y) = p(y|x)p(x)

p(y|x)= p(x|y) p(y)/p(x)

Also includes p(y|x) = p(x|y) p(y) / p(x)

How well did you know this?

Not at all

Perfectly

What is Bayes’ Rule used for?

Helps classify by updating beliefs based on new evidence

It is fundamental in machine learning.

How well did you know this?

Not at all

Perfectly

What does a Naive Bayes Classifier assume?

All features are independent given the class

This is why it is called ‘naive’.

How well did you know this?

Not at all

Perfectly

How does the Naive Bayes Classifier work?

Compute prior probabilities of classes: P(class)
Compute likelihoods for each feature: P(feature∣class)
Use Bayes Rule to find P(class∣features)
Choose class with highest probability.

Example: P(Larry) = 28/80

How well did you know this?

Not at all

Perfectly

What type of data does Gaussian Naive Bayes handle?

Continuous data

Assumes values follow a normal distribution.

Examples include height and weight.

How well did you know this?

Not at all

Perfectly

What is Multinomial Naive Bayes used for?

Count data

Example: Frequency of word appearances.

How well did you know this?

Not at all

Perfectly

What is Bernoulli Naive Bayes used for?

Binary features

Used when features are true/false type.

Used for true/false type features.

How well did you know this?

Not at all

Perfectly

List the pros of Naive Bayes.

Simple, fast, works with small data
Works well for high-dimensional data (e.g., text)

Works well with small data.

How well did you know this?

Not at all

Perfectly

List the cons of Naive Bayes.

Assumes features are independent
Can give bad estimates if some classes are underrepresented

Can give bad estimates if some classes are underrepresented.

How well did you know this?

Not at all

Perfectly

What does a decision tree model do?

Splits data based on feature conditions

Each internal node = feature test
Each leaf = predicted class (output)

Example: Is age > 25?

How well did you know this?

Not at all

Perfectly

What does each internal node in a decision tree represent?

Study These Flashcards

A feature test

Each leaf represents a predicted class.

What is the process for building a decision tree?

Study These Flashcards

Recursively split data using the best feature (based on some criteria).

Continue until data is “pure” or other stopping conditions are met.

Decision Tree grows with each level of questions

Each node (box) of the decision tree tests a condition on a feature. “Questions” are thresholds on single features.

All decision boundaries are perpendicular to the feature axes, because at each node a decision is made about a single feature

Continue until data is pure or stopping conditions are met.

What are the three splitting criteria in decision trees?

Study These Flashcards

Gini Index - Measures impurity → Lower = purer
Entropy - level of uncertainty
Information Gain

Information Gain: IG = H(Y) - H(Y|X)

What does conditional entropy measure?

Study These Flashcards

The uncertainty of a random variable given another variable

H(X, Y) = H(X|Y) + H(Y)

How does model complexity in decision trees relate to depth?

Study These Flashcards

complexity of the model induced by a decision tree is determined by the depth of the tree

Increasing the depth of the tree increases the number of decision boundaries and may lead to overfitting

Pre-pruning and post-pruning

Limit tree size (pick one): max_depth / max_leaf_nodes / min_samples_split (i nand more)

Too deep may lead to overfitting.

What is pre-pruning in decision trees?

Study These Flashcards

Limit depth or number of leaves during training

Can use parameters like max_depth or min_samples_split.

What is post-pruning?

Study These Flashcards

Cutting back after the tree is grown

Helps to avoid overfitting.

What is the criterion for regression with decision trees?

Study These Flashcards

Mean Squared Error (MSE) at each node

Used for predicting numerical values.

List the advantages of decision trees.

Study These Flashcards

Easy to interpret
Handles mixed data types
Fast to train

They provide clear decision paths.

List the disadvantages of decision trees.

Can overfit easily Weak as standalone model May need pruning/tuning ## Footnote They may require tuning or pruning to improve accuracy.

How does the decision tree algorithm work?

Choose an attribute on which to descend at each level Condition on earlier (higher) choices Generally, restrict only one dimension at a time Declare an output value when you get to the bottom

L8 Flashcards

(26 cards)