# Hypergeom_to_categ Flashcards

1
Q

What is the Hypergeometric distribution, and how is it commonly used in data analysis and machine learning?

A

The Hypergeometric distribution describes the number of successes in a sample drawn from a finite population without replacement. It’s used when calculating the probability of obtaining specific successes in a sample without returning items to the population. In data analysis and machine learning, it’s useful for scenarios like assessing sample quality or validating model performance using limited data.

2
Q

Could you explain the Probability Mass Function (PMF) of the Hypergeometric distribution and its formula?

A

The Hypergeometric PMF calculates the probability of getting exactly k successes in a sample of size n. Formula: P(X=k)=(nN​)(kK​)⋅(n−kN−K​)​

3
Q

How does the Cumulative Distribution Function (CDF) of the Hypergeometric distribution differ from the PMF, and what does it represent?

A

The Hypergeometric CDF gives the probability of getting k or fewer successes in a sample of size n. Formula: P(X≤k)=∑i=0k​P(X=i)

4
Q

Can you provide an example that illustrates the Hypergeometric distribution?

A

Imagine you have a batch of 1000 items, with 150 defective ones. If you randomly select 20 items, what’s the probability of getting exactly 3 defective items?

5
Q

How do you calculate the probability of getting 3 defective items in the given example using the Hypergeometric distribution?

A

Using the Hypergeometric PMF formula and calculating binomial coefficients: P(X=3)=(201000​)(3150​)⋅(20−31000−150​)​

6
Q

How can understanding the Hypergeometric distribution be valuable in data engineering or machine learning?

A

In data engineering, it’s useful for assessing sample quality or validating data subsets. In machine learning, it helps estimate the likelihood of observing specific outcomes in model evaluations, especially with limited data.

7
Q

What is the Negative Binomial distribution, and how does it differ from the Geometric distribution?

A

The Negative Binomial distribution models the trials needed for r successes in independent Bernoulli trials. Unlike the Geometric distribution, it considers total trials for r successes, not just the first success.

8
Q

Could you explain the Probability Mass Function (PMF) of the Negative Binomial distribution and its formula?

A

Negative Binomial PMF gives r successes on k-th trial. Formula: P(X=k)=(r−1k−1​)⋅pr⋅(1−p)k−r

9
Q

How does the Cumulative Distribution Function (CDF) of the Negative Binomial distribution differ from the PMF, and what does it represent?

A

Negative Binomial CDF gives prob. of k or fewer trials for r successes. Formula: P(X≤k)=∑i=rk​P(X=i)

10
Q

Can you provide an example that illustrates the Negative Binomial distribution?

A

Imagine flipping a biased coin until 5 heads. Each flip has 0.3 chance of heads. What’s the prob. of taking 10 flips for 5 heads?

11
Q

How do you calculate the probability of getting 5 heads on the 10th flip in the given example using the Negative Binomial distribution?

A

Using Negative Binomial PMF: P(X=10)=(5−110−1​)⋅0.35⋅(1−0.3)10−5

12
Q

How can understanding the Negative Binomial distribution be valuable in data engineering or machine learning?

A

It models attempts needed for a specific success count, like conversions. In ML, it might estimate iterations for training to reach a performance level.

13
Q

What is the Discrete Uniform distribution, and when do we commonly encounter it?

A

The Discrete Uniform distribution is a probability distribution where all outcomes are equally likely within a finite set of values. It’s encountered in scenarios where each outcome has the same probability, without any bias towards specific values.

14
Q

Could you explain the Probability Mass Function (PMF) of the Discrete Uniform distribution and its formula?

A

The Discrete Uniform PMF assigns an equal probability to each possible outcome. Formula: P(X=x)=n1​ where X is the outcome, x is a specific value, and n is the total number of outcomes.

15
Q

How does the Cumulative Distribution Function (CDF) of the Discrete Uniform distribution work?

A

The Discrete Uniform CDF gives the probability that the outcome is less than or equal to a specific value. It’s a step function increasing by n1​ at each outcome.

16
Q

Can you provide an example illustrating the Discrete Uniform distribution?

A

Consider rolling a fair six-sided die. What’s the probability of rolling a 3?

17
Q

How do you calculate the probability of rolling a 3 on a fair six-sided die using the Discrete Uniform distribution?

A

Using the Discrete Uniform PMF formula: P(X=3)=61​

18
Q

How can understanding the Discrete Uniform distribution be valuable in data engineering or machine learning?

A

In data engineering, it’s useful for equally likely outcomes, like generating random test data. In machine learning, it might be used in simulations or to create synthetic datasets with uniform characteristics.

19
Q

What is the Categorical distribution, and in what situations is it commonly used?

A

The Categorical distribution represents probabilities of outcomes in a discrete set of categories. It’s used with categorical data, such as survey responses, where each category has an associated probability.

20
Q

Could you explain the Probability Mass Function (PMF) of the Categorical distribution and its formula?

A

The Categorical PMF gives the probability of each category in the set. Formula: P(X=xi​)=pi​ where X is the categorical variable, xi​ is a specific category, and pi​ is the probability associated with category xi​.

21
Q

How does the Cumulative Distribution Function (CDF) of the Categorical distribution work?

A

The Categorical CDF gives the cumulative probability that the outcome is less than or equal to a specific category. It’s a step function that accumulates the probabilities of each category up to the desired category.

22
Q

Can you provide an example illustrating the Categorical distribution?

A

Consider a survey where participants choose their favorite fruit: Apple, Banana, or Orange. Probabilities: Apple (0.4), Banana (0.3), Orange (0.3).

23
Q

How do you calculate the probability of a participant choosing Banana in the given example using the Categorical distribution?

A

Using the Categorical PMF formula: P(X=”Banana”)=0.3

24
Q

How can understanding the Categorical distribution be valuable in data engineering or machine learning?

A

In data engineering, it models and analyzes categorical data like survey responses. In machine learning, it’s applied in scenarios with discrete categories, such as sentiment analysis or document categorization.

25
Q

Hypergeometric Distribution

A

image

26
Q

Negative Binomial Distribution

A

image

27
Q

Discrete Uniform Distribution

A

image

28
Q

Categorical Distribution

A

image

29
Q

All PMFs and CDFs

A