Week 1: Central Tendency, Varibality Measures, Z-Score, Linear Regression, Correlation, R Squared, Predicted values, Residual Flashcards

1
Q
1. What type of variables are these?
Country
Income
Temperature
IQ
PH
Cancer types
Hair color
Socio-economic statuscolor
Number of Pets
2. What central tendency and measures of variability can we calculate for each?
A
1. Country - Nominal (Mode only)
Income - Ratio (median, mode, mean, IQR, Var, SD)
Temperature - Interval (all of them)
IQ - Interval (all)
PH - Interval(all)
Cancer types - Nominal (mode only)
Hair colour - Nominal (mode)
SES - oridnal (median, mode)
nr of pets - ratio (median, mode)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What graphs do we use for numerical/qualitative variables?

A

Qualitative - bar chart

Numeric - histogram, boxplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
Central Tendency
What is 
1. Mean
2. Median
3. Mode
A
  1. Mean - average, typical (sum of var/nr of var)
  2. Median - the middle, order it ten pick middle
  3. Mode - the most frequent value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
Measures of Variability
What is: (formulas)
1. Variance
2 Standard Deviation
3. IQR
A
  1. Variance: how much the subject differ from each other
    Population Var: sigma^2=x1-miu^2/population size
    Sample Var: xi-mean value of all observ^2/n-1
  2. SD: measure the number of variations/ dispersion of a set of values
    Formula same as Var(x) but all with root square
  3. IQR: spread of data, also midpsread
    1s quartile- 2nd quartile
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a normal distribution?

A
mean=median=mode
empirical rule: 68/95/99.7% 1/2/3 SD
wel discribed by its SD
unimodal
symmetrical
centered
fixed score distirbution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The Standard Normal Distibution is…

A

a ND with mean: 0 and Variance: -1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the +/- skewness

A

+ right skewed - mode>median>mean

- negaive skewed - mode>median>mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the Z-Score?

A
  • How far is an observation from the mean in terms of SDs
  • The nr of SDs by which the value of raw score is above or below the mean value of what is being observed
  • The standardized score
    Z =(observed value-men)/SD
  • if we extend 1 SD above the mean and 1 SD below=> approx 68% of the observations are within the interval
  • Approx 95% of the populations would be between 2 SD above the mean and 2 SD below for a ND
  • Also, if x is normally distirbuted, then 1 is ND, with mean=0 and SD=1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the Correlation Coefficient?

Steps

A

Way of sum a scatter plot into an nr between -1 and 1
Steps
1. fits a straight line to the data
2. the cc rememebrs if the slope of the striaght line points downwards or upwards
if slope + => coeff (0-1)=> positive
if slope - => coeff (1-0)=> negative
is flope striaght => coeff is 0 => closer to 0 the weaker it is
3. looks at the quality of the fit of the straight line of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Pearson Correlation (Formula)

IN LINEAR REGRESSION ONLY !

A
  • summ the strength and direction of a straight-line relationship
    1. strength - the closebess of the points to a straight line
    2. direction - if one var generally increases or decreases
    rxy= (xi-x mean values)(yi- y mean values)/squar root (xi-x)(yi-y)^2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Linear Regression Analysis?

A

used to predict the value of a variable based on the value of another variable
describes the average relation between y-values and x-values
the points on the regression line are predicted by y-values and denoted by y hat
explores the relation btw a quantitative response var and oneor more explanatory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Regression Line is fully determined if:

A
  • > the intercation with te y-axis is known–> intercept

- > it is known how steep the line is–> slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Formulas for:
Regression Line
Regression Model

A

RL: Y hat=b0(intercept)-b1(slope) x Xi
RM: Yi= Y hat i+ ei=b0+b1 x Xi+ei (residual)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How R squared and rxy are related?

A

R^2=rxy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

About simple linear regression

A

One explanatory variable→simple regression
Multiple explanatory variables→multiple regression
- describes the average relation between Y values and X values
–> used whe y is numeric or continuous, x var as well
limited because it is useful for summ associations only

Y Hat = estimated value
Y Line = predicted value for an individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is R squared? Coefficient of Determination

A

indicates the percentage of variability of y explained by the variable x
tells us how good a regression line estimates/ predocts actual values

17
Q

What is an Error? What is Residual?

A

both are obersed errors for a set of data
difference between observed y and estimated y
= (yi-y hat)
distance between regression line and theactual observed value

18
Q

What is the independ and dependent var?

A
x- independent
y - dependent
x- explained var
y - response var
e.G 
x - type of treatment (indep)
y - blood pressure (dep)
treatment --> effect--> on blood pressure
19
Q

What does a low/high variance indicate?

A

A small variance indicates that the data points tend to be very close to the mean, and to each other. A high variance indicates that the data points are very spread out from the mean, and from one another.

20
Q

What is the difference between the predicted and actual Y score?

A

The residual

21
Q

How high can R squared go

A

up to 1 only!

22
Q

If the dots on a scatter plot are spread out randomly, the researcher would report the correlation as

A

close to 0

23
Q

What is a negative correlation

A

A negative correlation is a relationship between two variables in which one variable increases as the other decreases, and vice versa.
e. g:
The more often a person visits the dentist, the fewer cavities she/he will have.