L8 Flashcards

(26 cards)

1
Q

What is a random variable?

A

A variable representing an uncertain outcome (e.g., will it rain tomorrow?).
Has a domain of possible values (e.g., yes/no, 1-6 for a die, etc.).

Examples include yes/no questions or numerical outcomes like rolling a die.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the range of basic probability notation?

A

0 ≤ P(A) ≤ 1→ always between 0 and 1.
P(true)=1, P(false)=0
Union of two events:
P(A∪B) = P(A) + P(B) − P(A∩B) → set theory

P(true) = 1, P(false) = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the formula for the union of two events?

A

P(A∪B) = P(A) + P(B) − P(A∩B)

This is based on set theory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define conditional probability.

A

Probability of one event given another: P(A|B) = P(A∩B) / P(B)

p(slept in movie) = 0.5
p(slept in movie | liked movie) = ¼
p(didn’t sleep in movie | liked movie) = 3/4

Example: p(slept in movie | liked movie) = ¼

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is joint probability?

A

Probability that two things happen at the same time

Example: P(Slept,Liked movie)

Example: P(Slept, Liked movie)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the Chain Rule of Probability?

A

Links joint and conditional probabilities: p(x,y) = p(x|y) p(y) = p(y|x)p(x)

p(y|x)= p(x|y) p(y)/p(x)

Also includes p(y|x) = p(x|y) p(y) / p(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Bayes’ Rule used for?

A

Helps classify by updating beliefs based on new evidence

It is fundamental in machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does a Naive Bayes Classifier assume?

A

All features are independent given the class

This is why it is called ‘naive’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does the Naive Bayes Classifier work?

A
  1. Compute prior probabilities of classes: P(class)
  2. Compute likelihoods for each feature: P(feature∣class)
  3. Use Bayes Rule to find P(class∣features)
  4. Choose class with highest probability.

Example: P(Larry) = 28/80

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What type of data does Gaussian Naive Bayes handle?

A

Continuous data

  • Assumes values follow a normal distribution.

Examples include height and weight.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Multinomial Naive Bayes used for?

A

Count data

Example: Frequency of word appearances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Bernoulli Naive Bayes used for?

A

Binary features

  • Used when features are true/false type.

Used for true/false type features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

List the pros of Naive Bayes.

A

Simple, fast, works with small data
Works well for high-dimensional data (e.g., text)

Works well with small data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

List the cons of Naive Bayes.

A

Assumes features are independent
Can give bad estimates if some classes are underrepresented

Can give bad estimates if some classes are underrepresented.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does a decision tree model do?

A

Splits data based on feature conditions

Each internal node = feature test
Each leaf = predicted class (output)

Example: Is age > 25?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does each internal node in a decision tree represent?

A

A feature test

Each leaf represents a predicted class.

17
Q

What is the process for building a decision tree?

A

Recursively split data using the best feature (based on some criteria).

Continue until data is “pure” or other stopping conditions are met.

Decision Tree grows with each level of questions

Each node (box) of the decision tree tests a condition on a feature. “Questions” are thresholds on single features.

All decision boundaries are perpendicular to the feature axes, because at each node a decision is made about a single feature

Continue until data is pure or stopping conditions are met.

18
Q

What are the three splitting criteria in decision trees?

A
  • Gini Index - Measures impurity → Lower = purer
  • Entropy - level of uncertainty
  • Information Gain

Information Gain: IG = H(Y) - H(Y|X)

19
Q

What does conditional entropy measure?

A

The uncertainty of a random variable given another variable

H(X, Y) = H(X|Y) + H(Y)

20
Q

How does model complexity in decision trees relate to depth?

A

complexity of the model induced by a decision tree is determined by the depth of the tree

Increasing the depth of the tree increases the number of decision boundaries and may lead to overfitting

Pre-pruning and post-pruning

Limit tree size (pick one): max_depth / max_leaf_nodes / min_samples_split (i nand more)

Too deep may lead to overfitting.

21
Q

What is pre-pruning in decision trees?

A

Limit depth or number of leaves during training

Can use parameters like max_depth or min_samples_split.

22
Q

What is post-pruning?

A

Cutting back after the tree is grown

Helps to avoid overfitting.

23
Q

What is the criterion for regression with decision trees?

A

Mean Squared Error (MSE) at each node

Used for predicting numerical values.

24
Q

List the advantages of decision trees.

A

Easy to interpret
Handles mixed data types
Fast to train

They provide clear decision paths.

25
List the disadvantages of decision trees.
Can overfit easily Weak as standalone model May need pruning/tuning ## Footnote They may require tuning or pruning to improve accuracy.
26
How does the decision tree algorithm work?
Choose an attribute on which to descend at each level Condition on earlier (higher) choices Generally, restrict only one dimension at a time Declare an output value when you get to the bottom