Sabinas lecture 1 to 7 Flashcards
(40 cards)
Variance & SD
Variance & SD
sigma2 or V = the degree to which a variable ‘varies’ around its
mean V = sum (X-X)2/N-1 = SS/df
SD = squ root sigma 2 or V (in the same units, easier to interpret)
› Covariance
CoV = the degree to which two variables ‘vary’ simultaneously or co-vary
Note: the variance of a variable is… its covariance with itself.
Correlation
degree of linear
relationship between two variables and, essentially, it is a
standardised covariance
Continuous versus discrete variables
continuous and discrete (categorical)
Regression sum of sqares
Regression sum of sqares is about something we can predict. (1-R2) is what we can not predict.
how to work out t
b/ SEb = t
df Residual
df residual is proportion of the variable we cannot predict. N-K-1 predictors.
Confidence Intervals (CI)
b is an estimate of the population parameter. Ultimately, we want to know the true value of the regression coefficient. Having the CI helps to illustrate this idea (i.e., if we conducted this research 100 times, there is XX% chance that the true (yet unknown) slope is within the specified range of values) .
how to use CIs
If the range includes 0, then we can conclude that the findings are NOT n statistically significant, and vice versa.
› We can also use the CI to test whether the slope is different from a particular value (e.g., whether this slope is different from the one found in previous studies).
› SPSS does not calculate CI automatically
CI is sort of our parameter line. If I perform the experiment 100 times, this is the range I expect the B to be in.
Converting from b to β [in italics!]
ß = b x (SDx/SDy) b = ß x √(Vx/Vy)
If B is equal to zero
there is no equation. It is not important. It will still be featured in the regression equation (DO NOT TAKE IT OUT).
The most common null hyp is that b = zero. Slope is not different to zero, nothing systematic is happening.
It doesn’t have to be zero, the slope is stagnant. Is the new slope different to 1.5 or not? If the CI includes this, it is fine as a null hyp.
MR advantages
Can use both categorical and continuous independent
variables
› Can easily incorporate multiple independent variables
› Is appropriate for the analysis of experimental or
nonexperimental research
Factors Affecting the Results
of the Regression Equation
Sample size (N) The amount of scatter of points around the regression line [indexed by (Y-Y’)2 or SSresidual] = Other things being equal, the smaller SSresidual, the larger SSregression, and hence larger the F-ratio
›The range of values in the X variable,
indicated by (X-X)2
Assumptions Underlying MR (only a
glimpse now)
Dependent variable is a linear function of the IVs
- can be overlooked if one selects extreme cases of X… selection of only extreme cases can ‘force’ the regression to appear linear, even if it might be curvilinear for the X values. Bad practice…
› Each observation is drawn independently
› Errors are normally distributed
› The mean of errors is = 0
› Errors are not correlated with each other, nor with the IV
› Homoscedasticity of variance
- Variance of errors is not a function of IVs
- The variance of errors at all values of X is constant, meaning that it is the same at all levels of IV
reg df
number of IVS
do you report the non significant parts in regression conclusion?
YES
decimal places for b
three decimal places. .003 etc
what happens when you shorten the effect sample line graph?
B is same, ß changes. distribution is different so SDs change
why is ß the same as ry2 when the two IVs don’t correlate
because the overlap is not in the ven diagram. ß = ry2 when r12= 0
assumptions of error
we assume that they are normally distributed, independent,
and have constant variance.
regression line
that the IVs are differentially weighted
so that the prediction is optimised and the sum of the errors2 of prediction is minimised.
That is, the sum of squared values for each residual term is smaller than for any other
possible straight line, thus the term least squares
ß way of writing conc
standard scores or stand deviations. not standard units
what is a different metric
includes different scale of same dimension like cm is DIFF to hours. cm is DIFF to meters. must be exactly the same or use beta
when is something not a common cause
a, b and c paths equivalent to ß’s, where DV is VarY, and it is regressed on Variables X1 and X2
› If VarX1 has no effect on Y (b=0), but it has an effect on X2, then:
- it is not a common cause
- ßYX2 = r YX2 = c
- c does not change with the inclusion or exclusion of X1
- OR
- If VarX1 has no effect on X2 (a=0),
but it has an effect on Y, then:
- it is not a common cause
- ßYX2 = r YX2 = c
- c does not change with the inclusion or exclusion of X1