Lecture 6, 7 and 8 Flashcards

1
Q
  1. What does correlation coefficient ’r’ explain? Positive and negative correlation?
A

When r is either close to 1 or -1 there is a strong correlation between to variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. Why do we use OLS linear regression models?
A

To model the linear relationship between one or more independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. What is the difference between a correlation matrix and a linear regression model?
A

A correlation matrix quantifies associations between pairs of variables but does not provide a predictive model, while a linear regression model aims to model and quantify the relationship between independent and dependent variables to predict outcomes and understand variable associations more deeply.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. In the linear regression equation 𝑌 = 𝛼 + 𝛽 𝑋 + 𝜖, what are variables X and Y?
A

Y is the outcome and X is what you equation grows with.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In the linear regression equation 𝑌 = 𝛼 + 𝛽 𝑋 + 𝜖, what are parameters alpha and beta?

A
  • Alpha is you value when x is 0
  • Beta is what your linear equation grows with every time x expanse.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

. In the linear regression model, why do we require variation in values of X? What happens
if there is no variation in values of X?

A

if there is no variation in values of x the strength or direction in the linear regression model and there will not be a slope coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. What are dummy or indicator variables? Provide examples.
A
  • represent categorical data or to convert categorical data into a format that can be used in quantitative models.
  • use example in your own project
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we include nominal independent variables in regression analysis?

A

We have done that in our project reflect to that.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

. What is an outlier and why can they be a problem in your analysis? Provide 1 or 2 ways
in dealing with outliers.

A

An outlier is a observation or data point, which is significantly different from the rest of the dataset.
- Identify and examine: make a visualizations to identified the outliers.
- You can replace the outlier but I will affect your result.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In our linear regression model output, an R-squared 𝑅2 is reported, what does it mean and
what do we use it for?

A

R^2 explain how much variability in the dependent variable. If it close to 0 there is non variability in the dependent variable but if is 1 there is much variability in the dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Can we assume that the regression model with the highest number of predictors is the best
model? Why or why not.

A

More predictors just cause noise in the dataset, where R2 is a good indicator to say something about a regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The slope parameter 𝛽
̂
of a simple linear regression is +2. Interpret this number.

A

If beta have a +2 the regression line is positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

. The slope parameter 𝛽
̂
of a simple linear regression is -2. Interpret this number.

A

But if the linear regression model have -2 beta is have a negative line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

An analysis relates the age of used cars (in years) to their price (in USD), using data on a
specific type of car. In a linear regression of the price on age, the slope parameter 𝛽
̂
is -
700. Interpret the coefficient.

A

it means that every year the car is losing 700 in value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

An analysis relates the size of apartments (in square meters) to their price (in USD), using
data from one city. In a linear regression of the price on size, the slope parameter is +600.
Interpret the coefficient.

A

It means every time the apartments get a square meter bigger it gross 600 in price.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the advantage of using multiple regression model as compared to a simple
regression model?

A

one of the advantage is that you can have multi independent variables and 1 dependent variable in multiple regression where in simple regression you can only have 1 independent variables and 1 dependent variable

17
Q

In regression diagnostics, what does the following figure tell us: Histogram of residuals.

A

it can tells us which form the dataset have. It is normal distribution? Does it have skewness? Is it bimodal(2 peaks) or unimodal(1 peak).

18
Q

In regression diagnostics, what does the following figure tell us: Plot of residuals versus
predicted values of y.

A
  • when looking on a scatterplot, Patterns in the plot can reveal important information about the relationships between the dependent variable and the independent variables.
19
Q

Explain Heteroskedasticity.

A

is not constant across different levels of the independent variables. In simpler terms, it means that the spread or dispersion of the residuals varies systematically as you move along the range of the predictors.

20
Q

What is multicollinearity? How can we address this issue?

A

when there is a strong linear relationship between two or more predictors.
-interpretation difficulty

21
Q

How do you interpret regression coefficients in a logistic regression?

A

it is different from linear regression because when we interpret we have to binary outcome success/failure or yes/no.

22
Q

Provide an example of experimental design – what is the outcome of interest, explain
treatment and control groups in the study.

A

Experimental Design Example: Investigating the Impact of a New Drug on Blood Pressure

Outcome of Interest: The outcome of interest in this study is the blood pressure of individuals. Specifically, researchers want to determine if a new drug has an effect on blood pressure.

Treatment Group: In the experimental group (treatment group), participants are administered the new drug. They receive a specific dosage of the drug under controlled conditions.

Control Group: The control group consists of individuals who do not receive the new drug. Instead, they might receive a placebo (an inactive substance) or no treatment at all. The purpose of the control group is to provide a baseline for comparison. By comparing the blood pressure changes in the treatment group with those in the control group, researchers can assess whether any observed differences are due to the new drug or other factors