final Flashcards

(57 cards)

1
Q

univar numerical key eqns

A

y=Bo+B1x1+error
df = n-2
for null just B1=0, so non-linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

multivar numerical key eqns

A

y=Bo+B1x1+…BpXp+error
df=n-p-1
for null to be true B1…Bp =0 for the whole model, so non-linear

But some of the x’s by themselfs if individually compared to the response in its own model can be != 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the assumptions for OLS

A

L - linear in parameters B0 and B1
I - Independent errors: eta’s should be independent of one another and therefore no correlation among residuals
N - normality: normally distributed eta’s (w/ mean of zero) for B0 and B1 “centered around zero”
E- equal variance: variability 1 = variability 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does the OLS matrix math work out (one can be used for both uni and multi just that x=n*(p+1)

A

y=B*x+error
with
b=[(X’X)^-1]X’Y
with b = [Bo; B1;….]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain what residual deviance is and how to calculate for it

A

Residual deviance is a measure of how well the model fits the saturated one (which is where the model perfectly fits all data points in the range)
The lower the SS the better the fit

SS = -2(L1-Ls where L1 is the log likelihood for the current model and Ls is the log likelihood for saturated model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you find a critical t-value from a t-table for a regression slope hypothesis test?

A

Determine degrees of freedom = n-p-1 or n-2

Choose the significance level alpha = 0.05
if its two-tailed look for alpha/2 in the t-score table row for corresponding df

Tscore=(B1-0)/SE(B1) which is the estimate - null over SE

The table gives a critical t-val which we compare to the computed t-score to see if it exceeds the critical val

Calculate CI with bi +_ t*df * SEbi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the hypothesis test for a single predictor’s coefficient in a multiple regression model?

A

Null Hypothesis (H0): Bj=0 so predictor xj has no effect
Alternate Hyp. (HA): Bj != 0
Test Stat (t) eqn
If abs(t-table calc) > t(two-tailed) reject H0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are SSE, SST in linear regression?

A

SSE (Sum of Squared Errors): unexplained variation.
sum(yi-yi_hat)^2
to regression
SST (Total Sum of Squares): total variation in y.
sum(yi-y_bar)^2
to average “explained variation”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do we detect and address multicollinearity in multiple regression?

A

Detect via correlation matrix or Variance Inflation Factor (VIF).

VIFi = (1/(1-Ri^2)) with a regressions with our i-var as the response vs all others as predictor variables

VIF = 1 R^2=0, no correlation
under 5, R^2<0.8 moderate correlation
over 5, R^2 >0.8 highly correlated
over 10, R^2 >= 0.9, significant so need to correct by grouping two together possibly

Solutions: Remove/merge correlated predictors, use regularization (Ridge), or collect more data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is forward selection in model building?

A

Start with no predictors.

Test each available predictor individually, add the one that gives the greatest improvement (e.g., in R^2).

Repeat until adding further predictors fails to significantly improve the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is backward elimination in model building?

A

Start with all predictors.

Remove the least significant predictor (highest p-value or minimal improvement in R^2).

Repeat until all remaining predictors are significant or further removal degrades the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why are forward selection and backward elimination important?

A

They’re stepwise approaches to reduce a large set of predictors to a more parsimonious model.
They prevent overfitting by removing variables that add little predictive power.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the logistic regression model formula?

A

ln(p/(1-p))= B0+B1x1+….Bpxp with p being the probability of the “success” class

We use this for 2-level categorial variables

Use logit(pi)=ln(pi/(1-pi) above which we use to find pi=(e^B)/(1+e^B)
with B being B0+B1x1+…Bpxp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does multi-var inference work

A

1) start with predictors set as Bvalues B1…Bp and create a Ho for all of them
2) set up p-value ranges
3) Run regression to see if that predictor var is important on its own in the presence of other predictors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is multi-collinearity

A

A strong correpsondence between two predictor variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is multi-collinearity explored

A

1) Explore the observations so as each predictor variable increases, the response variable does what in turn

2) See if theres eqns that can link some of the PVs together (say each PV on its own could have low p-values but when all combined in one multiple refression they can all be highly corelated with one another resulting in a high degree of collinearity and interpretations cannot hold)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

how is categorical data regressed upon

A

logistic regression which holds by transformation(pi)=Bo+B1x1+…Bpxp
where after the transformation is applied we can solve for pi, which is the probability for the “Ha change” to occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what do the seed functions for logistic regression do

A

apply to different categorical functions
logit - gives odds ratios by ln(pi/(1-pi)

log - ln(lambda) which is used for the poisson distribution which is for the mean # of counts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is pi in logistic regression

A

the probability of an even occurring such that Yi=Pi in relation to our xp, like the probability of sucess

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why do we use logistic regression instead of linear regression for binary outcomes?

A

Binary data often violate linear regression assumptions.

Logistic regression constrains predictions between 0 and 1.

The log-odds transformation (logit) is compatible with a wide range of distributions and yields interpretable odds ratios.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How are logistic regression parameters estimated?

A

By using maximum likelihood, where we find the B that maximizes this likelihood from simply doing (amount var1 “change one”)/total observations

It doesnt produce the B’s but helps to optimize them, which we then use these optimized B’s (MLE estimates) to plug into logit to solve for the probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Interpret the slope Bj in logistic regression

A

Bj is the change in the log-odds of success for a 1-unit increase in xj, holding the other predpictors constant

The odds ratio for a 1-unit increase in xj is exp(Bj) = e^B1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is k-fold cross-validation, and why do we use it in regression?

A

In k-fold CV, the dataset is split into k-folds. We then train on k-1 folds and validate on the left-out odd

We then average the performance by using RMSE or R2 across the folds

This approach provides a better estimate of the model performance and helps to dected overfitting

24
Q

How do residual plots help diagnose issues in linear regression?

A

A random scatter of points suggests that linear assumptions and homoscedasticity might hold.
A pattern (e.g., curved shape, funnels) indicates possible non-linearity or heteroskedasticity.
Systematic patterns can also suggest outliers or missing predictors.

25
What is data permutation
Like shuffling data such that one column is held and the other is randomly arranged, to use for hypothesis testing
26
What can be said if the CI surrounds/includes zero?
There is not enough evidence to reject the null hypothesis then
27
what is goodness of fit
can help to describe residual deviance, which is how much the model differs from a perfect model that fits the data perfectly
28
what is the equatiions for the residual deviance
-2log(maximized log-likelihood for model| L1 / Saturated model where n=p so 1 obs per B (including B0) | LS) = -2(L1-LS)
29
what is the coefficient of determination
an analogy to r^2 which is the avergae expected value for success - the average expected vlaue for failure (p1-po)*bar hat
30
what is log-likelihood
The higher it is the better it will be at producing observed data The probability of the model to observe the given data given the models parameters
31
What is the purpose of using CV
To prevent overfitting, where the data is fit too well to the observations so any predictions outside of the range would be invalid
32
What is AIC
penalizes extra variables to balance under vs over fitting, and is used to compare models to one another with different quantities of k
33
AIC eqn
AIC = -2LL+2K, where as the AIC score decreases the model performance increases LL = K = number of parameters where = p+1 (predictors +intercept)
34
What is AIC min and max and deltaAIC
AIC min is the best performing model and AIC max is the worst performing model deltaK = AIC(of model k) - AICmin (best)
35
what is the deltaK breakdown
deltaK <= 2 comparable to best fit model deltaK>10 no support that the model fits 2
36
What is the Akaike weights
The probabilities for our models under AIC which we can use to rank parameters where wk = [exp(-deltak/2)]/[sum of model exp(-delta(n)/2)]
37
What is RMSE and what is the perfect prediction condition for RMSE
RMSE which is the Root means squared erros is to see how close our model can get to near perfect predictions RMSE = sum(sqrt(SSE/n)) where n is the number of observations from new/testing dataset
38
What is the maximum likelihood
a way of estimating for parameters that make the obsserved data most likely under an asusmed probability dsitribution so we can assume normality such that ML=OLS
39
How do we find the maximum likelihood for a binomial function
observed data (k) over number of trials (total = n) for a given p_hat with the eqn ("n/k")p^k(1-p)^(n-k) with MLE being at p_hat = k/n focuses on single param p
40
How do you estimate a population size using the hypergeometric prob function
N=finite population size K = number of success in pop = captures+mark k = observed data = recaptured+mark n = number of draws without replacement = recaptures (("K/k")["(N-k)/(n-k)"])/("N/n") draws successes without replacement
41
What does B1 represent in logistic regressions
change in log-odds for 1 unit increase in x1
42
Important assumptions in order to use logistic inference
Prob. distribution is well described by link function logit mean=n*p var=n*p*q where q is (1-p) log/poisson mean=var=lambda Observations are independent
43
What does Bo represent for linear vs logit
true mean for the response when all predictors are = 0 for linear For logistic regression B0 represents the log-odds of the outcome occuring when all independent varibales are 0
44
what is the difference between uppercase B and b's
Uppercase is for the population whereas lower case is for samples
45
what is the observed probability formula pi
pi =e^B/(1+e^B)
46
With bootstrapping we make a histogram of the slopes what happens if it isnt normal around zero
if the histogram of slopes doesnt follow a normal distribution around zero and does around another value we have an alternate hypothesis with the respective correlation
47
Why does RMSE increase with number of k
the higher k values capture the data more accurately which in turn make it harder to have good predictions so RMSE increases The lower the RMSE the better fit the model is
48
How is MLE different than OLS
For MLE it maximizes the likelihood function whereas for OLS it minimizes the residuals
49
What is the walkthrough for hypothesis testing using Tscores
1) Model Rwn by ID'ing the response and preductor w/ b0 to fit the form Y=B1x+B0 2) Take n, number of observation and use Yi = B1x+Bo+error 3) Run a regression with b0=B0 and b1=B1 and so on to find the SE's and coeffs 4) Evaluate to see if we can apply a linear model, and if B's != procced to next step 5) Calculate the Score = (B1-(null=0))/SE(B1) and then calc for the coresponding df 6) Tscore table and compare CI's to the standard 1.95(SE) for true pop slope -------> If Tscore by hand is greater than Tscore table then the slope is statistically significant 7) calc slope interval by bi+_ t*df * SE(bi)
50
If the residuals are randomly distributed around 0 what does that mean
that a linear model is a well reperesentative model for the data
51
How do we tell if a B value increases the odds w.r.t to the response variable for logistic regression
If beta is positive such that e^B > 1 than it increases the odds if e^1 =1 then no change in odds and below is decreases the odds
52
What does the averge RMSE tell us
by how many units are models are off from the actual values on average
53
what do the coeffs mean for linear, logit and poisson
increase in value per 1additional unit increase in log-odds per 1 additional unit increase log-count per 1 additional unit
54
What is the prupose of the MLE
The MLE is the value that, if true, would give you the highest probability of having observed the data you actually did. This makes it a natural choice for estimating the parameters of a model.
55
Describe how we can bootstrap and what we look for
Bootstrap the OLS and arrange the slopes into a histogram to verfiy if null or alternate hypothesis is present based on the centering of slopes Can further test for more accurate results by permutating the data and then find confidence intervals if required
56
What happends to the CI as the sample number increases
As the sample number/size increases the CI range narrows because the SE decreases by SE=sigma/sqrt(n)
57
What is the difference between an observation and a variable
observation- a single data point in dataset variable- characteristic attribute of the observation