12 Flashcards

Question 1

Q

What is a statistical model, and why are regression models commonly used?

Answer

A

A statistical model is a simplified representation of reality that characterizes the association structure in data. Regression models are common because they explicitly model how one or more predictor (independent) variables relate to an outcome (dependent) variable, allowing quantification of those relationships.

Question 2

Q

How does simple linear regression differ from multiple linear regression?

Answer

A

Simple linear regression uses one predictor variable to model the outcome as a linear function, whereas multiple linear regression includes two or more predictors, still assuming a linear relationship between each predictor and the outcome.

Question 3

Q

What key assumption underlies the use of linear regression regarding the relationship between variables?

Answer

A

The primary assumption is that the expected value of the outcome (Y) is a linear function of the predictor(s) (X), meaning that changes in X correspond to proportional changes in Y on average.

Question 4

Q

In linear regression, what do the intercept (α) and slope (β) represent?

Answer

A

The intercept (α) represents the expected value of the outcome when all predictors equal zero. Each slope (β) represents the expected change in the outcome for a one-unit increase in the corresponding predictor, holding other predictors constant.

Question 5

Q

Why is it important that the error terms (ε) in linear regression are independent and identically distributed with mean zero?

Answer

A

Independence ensures that errors for different observations do not influence each other, and identically distributed with mean zero implies that deviations around the fitted line are random, unbiased, and have constant variance; violating these can lead to incorrect inferences.

Question 6

Q

What does homoscedasticity mean in the context of linear regression?

Answer

A

Homoscedasticity means that the variance of the error terms (residuals) remains constant across all levels of the predictor variables. In other words, the spread of residuals does not systematically increase or decrease as fitted values change.

Question 7

Q

Why must predictor variables in a multiple linear regression not be highly correlated with one another?

Answer

A

High correlation among predictors (multicollinearity) inflates the variance of coefficient estimates, making them unstable and difficult to interpret, and can obscure the unique contribution of each predictor.

Question 8

Q

How is the coefficient of determination (R²) interpreted in linear regression?

Answer

A

R² represents the proportion of total variability in the outcome that is explained by the fitted regression model.

For example, R² = 0.60 means 60% of the outcome’s variance is accounted for by the predictors.

Question 9

Q

What purpose does the adjusted R² serve compared to the ordinary R²?

Answer

A

Adjusted R² penalizes R² for adding predictors that do not meaningfully improve model fit. It adjusts for the number of predictors relative to sample size, helping to prevent overfitting by reflecting model complexity.

Question 10

Q

Why can linear regression be seen as a generalization of the t-test and ANOVA?

Answer

A

A t-test compares the means of two groups, and ANOVA compares means across multiple groups. In linear regression, categorical group membership can be encoded as indicator variables, so a regression model with a group indicator tests the same mean differences.

Question 11

Q

What is the role of the standard error (SE) of a regression coefficient?

Answer

A

The standard error measures how precisely the regression coefficient is estimated, reflecting the variability of the estimate across hypothetical repeated samples. Smaller SE indicates more precise estimation.

Question 12

Q

How do you use a t-test within linear regression to assess whether a coefficient is statistically significant?

Answer

A

You compute a t-statistic as (β / SE(β)) and compare it to the appropriate t-distribution. A large magnitude of t (and corresponding low p-value) indicates the coefficient differs significantly from zero, suggesting a meaningful association.

Question 13

Q

Describe how to interpret a 95% confidence interval (CI) for a regression coefficient.

Answer

A

A 95% CI is the range that, under repeated sampling, would contain the true coefficient 95% of the time. If the CI does not include zero, it indicates a statistically significant association at the 5% level.

Question 14

Q

Why is the value 1.96 used when constructing a 95% CI for a normally distributed estimate?

Answer

A

In a standard normal distribution, approximately 95% of values lie within ±1.96 standard deviations from the mean. Thus, multiplying SE by 1.96 produces the bounds for a 95% CI.

Question 15

Q

How can a standard curve (calibration curve) in analytical chemistry rely on linear regression?

Answer

A

A standard curve is formed by measuring a known analyte’s signal (e.g., absorbance) at various concentrations. Linear regression fits a straight line relating concentration to signal. Unknown samples’ signals are then placed on the fitted line to estimate their concentrations.

Question 16

Q

What distinguishes logistic regression from linear regression in terms of the outcome variable?

Answer

Study These Flashcards

A

Logistic regression models a binary or categorical outcome (e.g., disease yes/no), whereas linear regression models a continuous outcome. Logistic regression uses the logit (log-odds) link to ensure predicted probabilities range between 0 and 1.

Question 17

Q

In logistic regression, what does the logit function (log-odds) accomplish?

Answer

Study These Flashcards

A

The logit function transforms a probability p∈(0,1) to the entire real line (–∞, +∞) via log(p / (1–p)). This linearizes the relationship between predictors and log-odds, allowing fitting with standard linear model techniques.

Question 18

Q

How is an odds ratio (OR) derived from a logistic regression coefficient (β)?

Answer

Study These Flashcards

A

The odds ratio for a one-unit increase in a predictor is exp(β). If β = 0.5, then OR = e⁰·⁵ ≈ 1.65, meaning the odds of the outcome are 1.65 times higher per unit increase in that predictor.

Question 19

Q

When is an odds ratio (OR) considered statistically significant based on its 95% CI?

Answer

Study These Flashcards

A

If the 95% CI for the OR does not include 1, the association is statistically significant at the 5% level. Inclusion of 1 implies no significant difference in odds between comparison groups.

Question 20

Q

Why are odds ratios used in retrospective (case–control) studies instead of risk ratios?

Answer

Study These Flashcards

A

In case–control studies, the total number of exposed individuals in the population is unknown, so absolute risks cannot be calculated. However, odds within cases and controls are observable, allowing OR computation.

12 Flashcards

(20 cards)