Module 2_ 3. Probability and Statistics Flashcards

(40 cards)

1
Q

Eg. Rolling a fair dice
Is it an example of continuous random variable or discrete random variable?

A

discrete random variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Eg. Measuring height of a randomly picked student
Is it an example of continuous random variable or discrete random variable?

A

continuous random variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between population and sample? Explain with example.

A

Suppose we need to calculate the average height of people in the world.
If we go by population we will consider all the heights of 7 billion people and calculate the mean using the below formula:
μ = (1/7B) Σ hi
If we go by sample, we will consider a subset of the heights of 7 billion people (like take only 1000 heights) and calculate the mean using the below formula:
x̄ = (1/1000) Σ hi

As sample size increases,
x̄ ≈ μ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Does Gaussian distribution occur in real world? If yes give 2 examples.

A

YES
- SL and PL of iris flowers.
- Heights and weights of people in real world.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If X follows Gaussian distribution and has mean(μ) and variance(σ^2), then write it in mathematical form.

A

X ~ N(μ, σ^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the 68-95-99 rule?

A
  • In range [μ - σ, μ + σ], 68.2% of points lie
  • In range [μ - 2σ, μ + 2σ], 95% of points lie
  • In range [μ - 3σ, μ + 3σ], 99.7% of points lie
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the mathematical formulation of Gaussian distribution?

A

P(X=x) = P(x) = (1/σ√(2π)) exp{-(x-μ)^2/(2σ^2)}

Simplifying the above equation,
Let μ=0, σ^2=1
P(X=x) = P(x) = (1/√(2π)) exp{(-1/2)x^2}

After removing constants,
P(x) = y = exp{-x^2}
As x moves away from μ, y reduces exponentially

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

True or False.
PDF of Gaussian distribution is symmetric.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Kurtosis? What is the formula for Excess kurtosis?

A

Kurtosis - Measure of tailedness and not peakedness
Excess Kurtosis —-> Kurtosis - 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is standard normal variate(Z)?

A

Z ~ N(0,1) where μ=0 and σ^2=1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is standardization?

A

Standardization is converting any gaussian distribution with a finite mean(μ) and variance(σ^2) to a standard normal variate.
x’i = (xi - μ)/σ ———————-> x’i ~ N(0,1)
Now we can say 68.2% of these converted points lie between -1 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Kernel Density Estimation(KDE)?

A

For each point in the sample space, a gaussian kernel is drawn(with the point being mean) and also area with higher density of points will have higher height in PDF.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is sampling distribution?

A

Lets say we take m random samples each of size n.
Lets say n=30
S1, S2, S3, …………, Sm (m-samples)
x̄1, x̄2, x̄3, …………, x̄m are the means of m samples
Then x̄i belongs to a distribution called as the “Sampling distribution of sample means”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Central Limit Theorem(CLT)?

A

If X has finite mean(μ) and variance(σ^2),
——–> S1, S2, S3,………..,Sm (m samples of size n)
——–> x̄1, x̄2, x̄3,……….., x̄m (sample means)
——–> x̄i ~ N(μ, (σ^2)/n) as n->∞

here σ^2 is the variance of original data

CLT is powerful because it works on data having any kind of distribution which has a finite mean(μ) and variance(σ^2)

Note: CLT doesn’t work for pareto distribution since it has infinite mean and variance

Also in real world when n >= 30 things start falling in place and sampling distribution of sample means becomes gaussian distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are Q-Q plots? How to plot them?

A

Q-Q plots stand for Quantile-Quantile plots.
They can be used for comparing two distributions (X and Y) and finding out whether they have the same distribution.

Eg.Given X: x1,x2,….x500
Is X gaussian distributed?

Steps:
1.Sort xi’s and compute percentiles
——–> x1, x2, ………, x500
——–> Sort in ascending order
——–> Calculate percentiles
——–> x(1), x(2), …….., x(100)

  1. Y ~ N(0,1)
    ——–> y1, y2, ………., y1000
    ——–> Sort in ascending order
    ——–> Calculate percentiles
    ——–> y(1), y(2), …….., y(100)
  2. Plot Q-Q plot using x(1), x(2), …….., x(100)
    y(1), y(2), …….., y(100)

If all points lie on a straight line then we can say X and Y have similar distributions.
But we can’t conclude that X also has μ=0 and σ^2=1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Task: Order t-shirts for all employees (100k)
a. How many XL t-shirts should you order?
Domain knowledge :
height >= 180cm for XL t-shirt
height [160cm,180cm] for L t-shirt

A

Collect heights of 500 random employees.
heights ~ N(μ, σ^2)
Plot CDF.
Suppose from CDF we observed P(h >= 180cm) = 1%
So now we will order 1000 XL t-shirts i.e. 1% of 100K

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Task: Salaries
If X ~ N(μ, σ^2),
a. Calculate how many employees make a salary >= $100K?
b. Calculate how many employees have salary between [$50K, $70K] ?

A

a. Plot CDF to find out.
b. Plot CDF and calculate the difference between the two percentages.

17
Q

If I don’t know the distribution but i know μ is finite and variance is non-zero and finite.
Task: Salaries
μ=$40K and σ=$10K
a. What % of individuals have salary in range of [$20K, $60K] ?

A

Chebyshev’s Inequality Formula:
P(|X - μ| >= kσ) <= (1/k^2) ——-> P(μ - kσ < X < μ + kσ) >= 1- (1/k^2)

20K = μ - 2σ
40K = μ
60K = μ + 2σ

P($20K < X < $60K) >= 1 - (1/2^2)
P($20K < X < $60K) >= 0.75
75% of individuals have salary in range of [$20K, $60K]

18
Q

Explain Bernoulli Distribution.

A

Bernoulli Distribution:
Eg. X ——–> r.v. for getting heads in a coin toss
- Discrete distribution which has 2 outcomes
- Probability ———> P & (1 - P)

19
Q

Explain Binomial Distribution.

A

Binomial Distribution:
Eg. X ——–> Coin tossed n times (n=10)
- Y ——-> No. of times of getting head
- Y ∈ {0, 1, 2, …….., 10}
Y ~ Binomial(n, P) ——-> n=no. of trials & P=probability of getting heads

20
Q

What is Log Normal Distribution?

A

X ~ log-normal(μ, σ^2) ,
If log(X) ~ normal distribution

Note: As σ^2 increases, PDF becomes more skewed.

21
Q

What are the applications of Log Normal Distribution?

A
  1. Length of comments posted in internet discussions.
  2. User’s dwell time on online articles.
  3. Salaries of people

In general, human behaviour mostly follows log-normal distribution.

22
Q

How to find whether X ~ log-normal(μ, σ^2) ?

A

x1,x2,…..,xn ———–> log(x1), log(x2), ….., log(xn) ———–>yi’s

Now we can use QQ plot to determine if yi’s follow normal/gaussian distribution or not.
If they follow then X is log-normally distributed.

23
Q

What is Power-law Distribution (a.k.a. Pareto distribution) ? Give some examples.

A
  • Follows 80-20 rule
  • 80% points lie in 20 % of the region and vice versa
  • Have infinite mean & variance

Eg.
1. File size distribution in internet traffic(many small files & few large files).
2. Hard disk drive error rates.

24
How to check if distribution is pareto distribution?
1. Q-Q plots 2. log-log plots
25
What is Box-Cox transform?
Power-law/Pareto ----(box-cox transform)----> Gaussion/Normal 1. box-cox(X) ---->lambda(λ) 2. to calculate yi - IF λ != 0, (xi-1)/λ ELSE log(xi) -----> i.e. if λ=0 If λ=0, xi ~ log-normal distribution
26
Suppose, X : heights Y : weights Which three measures can be used to quantify the below type of relationships? - As X increases, Y increases - As X increases, Y decreases
1. Co-variance 2. Pearson co-relation coefficient 3. Spearman rank co-relation coefficient
27
What is co-variance? What are its drawbacks/limitations?
Co-variance(X,Y) = (1/n) Σ (xi - μx) * (yi - μy) If Co-variance(X,Y) = +ve ------------> As X increases, Y increases If Co-variance(X,Y) = -ve ------------> As X increases, Y decreases Drawbacks/Limitations: 1. If X = height in cm, Y = weight in kg, X' = height in ft, Y' = weight in lbs then Co-variance(X,Y) != Co-variance(X',Y') If we change the scale the covariance also changes which is bad
28
What is Pearson co-relation coefficient? What are its drawbacks/limitations?
Px,y = Co-variance(X,Y)/σxσy where σx = √variance(X) and σy = √variance(Y) If Px,y = +ve ------------> As X increases, Y increases If Px,y = -ve ------------> As X increases, Y decreases Drawbacks/Limitations: 1. Px,y = +1 only if linear relationship exists between X & Y. So, if y=x^2, P<1 (even though its monotonically increasing). 2. Slope of straight line doesn't affect the Px,y. 3. Complex relationships are not captured. Eg. sine wave Fix ----> Spearman rank co-relation coefficient
29
What is Spearman rank co-relation coefficient?
X Y rx ry s1 160 52 4 3 s2 150 166 2 4 s3 170 68 5 5 s4 140 46 1 1 s5 158 51 3 2 Here we are sorting X and Y and giving them ranks in ascending order. We saw, Px,y ------> linear relationship r = Prx,ry This means Spearman rank co-relation between two variables is equal to the "Pearson co-relation" between the rank values of those two variables - If as X increases, Y increases (linear or not doesn't matter) ----> r =1 - If as X increases, Y decreases (linear or not doesn't matter) ----> r =-1 Also Spearman rank co-relation is more robust to outliers than Pearson co-relation.
30
Explain Correlation vs Causation.
- "Correlation" does not imply "Causation". - Just because two random variables are correlated (eg. X increases, Y increases) doesn't mean X causes Y or vice versa. Eg. Graph of nobel laureates vs chocolate consumption.
31
Give 4 examples of how to use correlations?
1. Is salary correlated with sq. footage of your home? 2. Is no. of years of education correlated with income? 3. E-commerce(Amazon): - Time spent in 24 hrs vs money spent in 24 hrs - # unique visitors in a day vs $ sales in a day 4. Medicine: - Dosage of a drug vs Reduction in blood sugar
32
Explain confidence interval with example.
- A confidence interval, in statistics, refers to the probability that a population parameter will fall between a set of values for a certain proportion of times Eg. X ~ any distribution Also X -----> heights of people -------> {x1, x2, ........, x10} ------> random sample of size 10 Estimate population mean i.e. μ of X. μ ≈ x̄ -------> where x̄ is sample mean -------> This is a point estimate. Not bad but we can do better If we say, μ ∈ [162.1, 174.9] with 95% confidence - Interval with some confidence value - Richer than previous in terms of information Note: If we repeat the sampling multiple times, each time we get a different value for x̄. In 95% of sampling experiments, μ will be between endpoints of C.I. calculated using x̄, but in 5% of cases it will not be. C.I. does not mean that μ lies in the interval with 95% probability.
33
How to compute confidence interval in case of gaussian distribution?
Say X ~ N(μ, σ^2) Let μ = 168cm and σ = 5cm We know from gaussian distribution that (μ-2σ, μ+2σ) contains 95% of my observations. So we can say heights of people lie between [158, 178] with 95% confidence. Similarly other values like 90%, 80%, etc. can be found using Normal dist. tables. Eg. Suppose C=90% (1 - C)/2 = (1 - 0.9)/2 = 5% - Lie in [x', x"] with 90% confidence - x', x" can be found by looking at the normal-dist. tables. All this data is tabulated.
34
How to compute confidence interval if we don't know the distribution but we know that it has a finite mean and variance?
Case 1: X~ some dist. with finite μ and σ^2 Q. What is the 95% C.I. of μ? Let σ = 5cm {x1, x2, ........., x10} --------> somple of size(n) = 10 x̄ = sample mean = (1/10) Σ xi ----> n=10 As we learnt earlier using CLT, we can say, x̄ ~ N(μ, (σ/√n)) Hence we can say that, μ ∈ [x̄ - (2σ/√n), x̄ + (2σ/√n)] with 95% confidence Case 2: It we dont know σ Use students t-dist x̄ ~ t(n-1)
35
Explain confidence interval using bootstrapping with example.
Task : Estimate 95% of C.I. for median of X using only the given sample of X S = {x1, x2,.........,xn} ------> using sampling with replacement using u(1,n) i.e. uniform random variable between 1 to n Let k = 1000 and m <=n - S1 : x1', x2', ......... , xm' ----> m1 ---> median of sample 1 - S2 : x1', x2', ......... , xm' ----> m2 ---> median of sample 2 : : : - Sk : x1', x2', ......... , xm' ----> mk ---> median of sample k ---------> m1, m2, ........, m1000 ---------> sort --------->m1'<=m2'<=m3 ........, <=m1000'(increasing order) ---------> 95% C.I = 950/1000 = 95% Therefore 95% C.I is [m25, m975]
36
Explain hypothesis testing with example.
Task : Given a coin, determine if the coin is biased towards heads or not - Test Statistic : Flip coin 5 times and count no. of heads = X - Perform experiment -----> H H H H H -------> X = 5 ------> This is our observation Let H0 = Coin is not biased towards heads P(observation | H0) = P(X=5 | Coin is not biased towards heads) = 1/(2^5) = 1/32 ≈ 0.03 = 3% P(observation | H0) is also called a p-value. Typically, p-value < 5% is said to be small. Here P(X=5 | H0) = 3% So there is a 3% chance of getting 5 heads in 5 flips if coin is not biased. 3% ----> quite low The observation is done practically, so it is the ground truth. Hence, our assumption i.e. H0 may be incorrect. So, we reject our null-hypothesis (H0) ----> Reject the idea that the coin in not biased H0 : Coin is not biased -----> Null hypothesis H1 : Coin is biased -----> Alternate hypothesis Rejecting H0 means accepting H1 Rejecting H1 means accepting H0 So, we accept the fact that the coin is biased towards heads.
37
Explain resampling and permutation test with example.
Task : Determine if population mean of heights in two cities is same or not Experiment : Measure the heights of 50 random people for each city. Let μ1 and μ2 be sample means of both cities. say (162 and 167) Test statistic : μ2 - μ1 = 167 - 162 = 5cm(X) Null hypothesis (H0) : There is no difference in population mean of both cities Computing P(X=5 | H0) : 1. Take all heights of both cities and put them together in a new set (S). 2. Randomly select 50 pts from S to S1 and remaining 50 to S2. This is resampling. Since S1 and S2 are coming from the same distribution (S) randomly, this will simulate 2 cities having same population mean or simulate null-hypothesis (H0). Calculate μ1 and μ2 and also μ2 - μ1 = δ 3. Repeat 2nd step k no. of times 4. Sort δi's in increasing order δ1<=δ2<=...............<=δk (Our observed difference = 5cm) Say k =1000 and our observed difference is at δ801. So 20% of sim. difference is greater than observed difference. P(diff >= 5cm | H0) = 0.2 -----------> significant -----------> Accept H0
38
Explain how to perform K-S test for similarity of two distributions.
Let X1 and X2 be the two samples of size m and n respectively. Also, let Dm,n be the maximum difference in their CDFs Test Statistic : Dn,m = sup|F1,n(x) - F2,m(x)| -------> maximum diff. in their CDFs Null Hypothesis (H0) : X1 and X2 have the same distribution. If, Dm,n > c(α) √((m+n)/mn) then, We reject our null hypothesis (H0) and conclude that X1 and X2 have diff distributions else, We accept H0 Note : α and c(α) values are taken from table Eg. If we decide α=0.05 then the corresponding value for c(α) is taken from table
39
Explain proportional sampling with example.
d = [2.0, 6.0, 1.2, 5.8, 20.0] Task : Pick an element amongst the n elements s.t. probability of picking an element is proportional to the di's Step1 : a. s = Σ di = 35 b. di' = di/s ---> d1' = 0.0571 ---> d2' = 0.171428 ---> d3' = 0.0343 ---> d4' = 0.1657 ---> d5' = 0.5714 Here Σ di' = 1 c. cumulative normalized sum ---> d1" = d1' = 0.0571 ---> d2" = d1" + d2 '= 0.228528 ---> d3" = d2" + d3 '= 0.262828 ---> d4" = d3" + d4 '= 0.428528 ---> d5" = d4" + d5 '= 1 Step2 : sample one value unif(0.0, 1.0) r = numpy.random.uniform(0.0, 1.0, 1) let r = 0.6 Step 3 : Proportional sampling if r <= d1" return 1 elif r <= d2" return 2 elif r <= d3" return 3 : : :