Linear regression-2 Flashcards

P-values, linear regression

1
Q

What is the problem with R squared

A

R squared is sensitive to amount of data. When we see a pattern in small dataset, we don’t have confidence to say that the pattern is not due to random chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is p-value?

A

p-value gives us a measure of confidence in the results from statistical analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Set the intuition of p-value with example

A

Let us consider two drugs. Drug A and Drug B. p-value helps us determine if both these drugs are equally efficient or one is more efficient than other. If p-value allows us to establish a difference, then we worry about if A is good/worse than B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

p-value interpretation when A and B are different.

A

We did one experiment giving drugs A and B to people. A cured 37% and B cured 31%. From the overall picture, A worked better than B. But how confidently can we say that A is better than B. This is where p-values help. p-values are numbers between 0 and 1 and they quantify how CONFIDENT WE SHOULD BE THAT DRUG a IS DIFFERENT THAN DRUG B.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does p-value 0 denote

A

The closer a p-value is to 0, the more confidence we have that drugs A and B are different.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The closer a p-value is to 0, the more confidence we have that drugs A and B are different. - How close should a p-value should be to 0?

A

In practice, commonly used threshold is 0.05.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does p-value threshold of 0.05 mean?

A

It means that if theres no difference between drugs A and B, and if we did the exact same experiment a bunch of times, then ONLY 5% OF THOSE EXPERIMENTS WOULD RESULT IN WRONG DECISION

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to calculate p-value

A

Using a statistical test like - Fischer’s test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain the definition of p-value threshold when the value is 0.05 with example

A

Example:
1. Let’s give same drug, drug A to two groups of people. The first test gave p-value 0.09 - failed to see a difference between the two groups.
2. Repeat the same experiment and the p-value will be really high and we will fail to see the difference.
3. Next, once in a while there might be a chance that group of people allergic to the drug might end up in the same group. In this case drug A might fail to work on them and will get a small p-value - suggesting that there is a difference in the drug used by the two groups, although they are using the same drug. This is a FALSE POSITIVE.

A threshold of 0.05 means that 5% of the experiments, where the only differences come from weird random things will generate a p-value smaller than 0.05.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to set threshold for p-value

A

For extremely important test, like the effectiveness of drug, we need high confidence when make a statement that the drugs are either different or not difference. In such cases we can use extremely small thresholds like 0.01 or 0.001 (1 in 1000 experiments can lead to False Positive)

Likewise, for not so important tests, we can use a bit higher threshold like - 0.2 (2 in 10 can lead to FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

p = 0.24 denotes that drugs …..

A

We are not confident that drugs A and B are different.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

p = 0.02 denotes that drugs ….

A

We are confident that the drugs A and B are different.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Hypothesis testing?

A

The idea of trying to determine if the drugs are same or not is called Hypothesis testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Null Hypothesis?

A

The Null Hypothesis is the drugs are same and the p-values helps us to decide if we should reject the Null Hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does p-value DOESN’T tell us

A

Although p-value helps us decide if drugs A and B are different, they don’t tell us HOW DIFFERENT they are.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the MOST IMPORTANT property of p-value that you should keep in mind while viewing the p-value results?

A

A small p-value DOESN’T IMPLY that the effect size or difference between drugs A and B is large.

A large p-value DOESN’T IMPLY that the effect size or difference between drugs A and B is small.

17
Q

How does Linear regression fit a line, like final decision based on what?

A

Linear regression fits a line to the data that minimizes the Sum of Squared Residuals.

18
Q

How to quantify the accuracy or quality of the predictions in Linear regression?

A

Once we fit a line to the data, we can calculate R squared which gives us a sense of how accurate our predictions will be.
And then linear regression provides p-value for R squared, so we should get a sense of how confident we should be on the predictions.

19
Q

How to minimize the SSR in linear regression

A

Choose the y-axis intercept and slope such that it minimizes the SSR

20
Q

What is analytical solution in general?

A

In Analytical solution we end up with a formula that we can plug the data into and the output is the optimal value.

21
Q

What is the analytical solution for linear regression to find optimal y-intercept

A
  1. Keeping the slope constant, we see how SSR changes for different Y-intercept value => result in a curve in u shape.
  2. Find the lowest SSR/point on the curve.
  3. To find the lowest point in the curve is to calculate the derivative of the curve. When the derivative is 0, that is the bottom of the curve.
22
Q

What is iterative approach in Linear regression

A

Gradient Descent - A way to find the optimal slope and y-axis intercept.

23
Q

Which one is quicker - analytical or iterative approach

A

Iterative.

24
Q

Where is iterative approach used?

A

Scenarios where there are no analytical solutions, including Logistic regression, Neural networks etc.

25
Q

What does Rsq - 0.66 and p-value - 0.1 suggest

A

For p-value - 0.1 implies that there is a 10% chance that the random data(random data is used in deriving p-value) could give us Rsq >= 0.66. this is relative high p-value, so we might not have a lot of confidence in the predictions.

26
Q
A