12 Flashcards

(20 cards)

1
Q

What is a statistical model, and why are regression models commonly used?

A

A statistical model is a simplified representation of reality that characterizes the association structure in data. Regression models are common because they explicitly model how one or more predictor (independent) variables relate to an outcome (dependent) variable, allowing quantification of those relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does simple linear regression differ from multiple linear regression?

A

Simple linear regression uses one predictor variable to model the outcome as a linear function, whereas multiple linear regression includes two or more predictors, still assuming a linear relationship between each predictor and the outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What key assumption underlies the use of linear regression regarding the relationship between variables?

A

The primary assumption is that the expected value of the outcome (Y) is a linear function of the predictor(s) (X), meaning that changes in X correspond to proportional changes in Y on average.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In linear regression, what do the intercept (α) and slope (β) represent?

A

The intercept (α) represents the expected value of the outcome when all predictors equal zero. Each slope (β) represents the expected change in the outcome for a one-unit increase in the corresponding predictor, holding other predictors constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is it important that the error terms (ε) in linear regression are independent and identically distributed with mean zero?

A

Independence ensures that errors for different observations do not influence each other, and identically distributed with mean zero implies that deviations around the fitted line are random, unbiased, and have constant variance; violating these can lead to incorrect inferences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does homoscedasticity mean in the context of linear regression?

A

Homoscedasticity means that the variance of the error terms (residuals) remains constant across all levels of the predictor variables. In other words, the spread of residuals does not systematically increase or decrease as fitted values change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why must predictor variables in a multiple linear regression not be highly correlated with one another?

A

High correlation among predictors (multicollinearity) inflates the variance of coefficient estimates, making them unstable and difficult to interpret, and can obscure the unique contribution of each predictor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is the coefficient of determination (R²) interpreted in linear regression?

A

R² represents the proportion of total variability in the outcome that is explained by the fitted regression model.

For example, R² = 0.60 means 60% of the outcome’s variance is accounted for by the predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What purpose does the adjusted R² serve compared to the ordinary R²?

A

Adjusted R² penalizes R² for adding predictors that do not meaningfully improve model fit. It adjusts for the number of predictors relative to sample size, helping to prevent overfitting by reflecting model complexity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why can linear regression be seen as a generalization of the t-test and ANOVA?

A

A t-test compares the means of two groups, and ANOVA compares means across multiple groups. In linear regression, categorical group membership can be encoded as indicator variables, so a regression model with a group indicator tests the same mean differences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the role of the standard error (SE) of a regression coefficient?

A

The standard error measures how precisely the regression coefficient is estimated, reflecting the variability of the estimate across hypothetical repeated samples. Smaller SE indicates more precise estimation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you use a t-test within linear regression to assess whether a coefficient is statistically significant?

A

You compute a t-statistic as (β / SE(β)) and compare it to the appropriate t-distribution. A large magnitude of t (and corresponding low p-value) indicates the coefficient differs significantly from zero, suggesting a meaningful association.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe how to interpret a 95% confidence interval (CI) for a regression coefficient.

A

A 95% CI is the range that, under repeated sampling, would contain the true coefficient 95% of the time. If the CI does not include zero, it indicates a statistically significant association at the 5% level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is the value 1.96 used when constructing a 95% CI for a normally distributed estimate?

A

In a standard normal distribution, approximately 95% of values lie within ±1.96 standard deviations from the mean. Thus, multiplying SE by 1.96 produces the bounds for a 95% CI.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can a standard curve (calibration curve) in analytical chemistry rely on linear regression?

A

A standard curve is formed by measuring a known analyte’s signal (e.g., absorbance) at various concentrations. Linear regression fits a straight line relating concentration to signal. Unknown samples’ signals are then placed on the fitted line to estimate their concentrations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What distinguishes logistic regression from linear regression in terms of the outcome variable?

A

Logistic regression models a binary or categorical outcome (e.g., disease yes/no), whereas linear regression models a continuous outcome. Logistic regression uses the logit (log-odds) link to ensure predicted probabilities range between 0 and 1.

17
Q

In logistic regression, what does the logit function (log-odds) accomplish?

A

The logit function transforms a probability p∈(0,1) to the entire real line (–∞, +∞) via log(p / (1–p)). This linearizes the relationship between predictors and log-odds, allowing fitting with standard linear model techniques.

18
Q

How is an odds ratio (OR) derived from a logistic regression coefficient (β)?

A

The odds ratio for a one-unit increase in a predictor is exp(β). If β = 0.5, then OR = e⁰·⁵ ≈ 1.65, meaning the odds of the outcome are 1.65 times higher per unit increase in that predictor.

19
Q

When is an odds ratio (OR) considered statistically significant based on its 95% CI?

A

If the 95% CI for the OR does not include 1, the association is statistically significant at the 5% level. Inclusion of 1 implies no significant difference in odds between comparison groups.

20
Q

Why are odds ratios used in retrospective (case–control) studies instead of risk ratios?

A

In case–control studies, the total number of exposed individuals in the population is unknown, so absolute risks cannot be calculated. However, odds within cases and controls are observable, allowing OR computation.