250A Midterm Flashcards
advantages and disadvantages of the mode
+ actually occurs in data
+ only thing that makes sense for nominal data
+ not affected by outliers
- not able to manipulated mathematically bc no formula
advantages and disadvantages of the median
\+ not affected by outliers \+ does not require interval assumptions \+ makes absolute error as small as possible - does not enter into equations nicely - cannot be decomposed - poor estimator of the population value
advantages and disadvantages of the mean
+ unbiased (best estimate of pop mean)
+ makes average squared errors as small as possible
+ has a mathematical formula so can be manipulated
- affected by outliers
- need interval scale
+ and - of IQR
+ good for boxplots
+ does not assume normality of data
- throws away too much data
what does variance mean?
variance tells us the average squared deviation from the mean – how much, on average, each observation differs from the mean in squared units.
proportional to average squared difference between all pairs of observations so it summarizes both how different scores are from each other and how different they are from the mean!
what is standard deviation?
SD is average deviation from the mean: how much on average each observation differs from the mean
why do we use variance and sd instead of mean deviation and mean absolute deviation?
sum of mean deviations is always 0 so this doesn’t tell us anything
var and sd are useful mathematically because you can partition them.
mean absolute deviation is biased and inconsistent.
difference between trimmed and windsorized means/variances
trimmed = chop off top x% and bottom x% of data and recompute mean and var
windsorized: same as trimmed but replace missing data with new lowest and highest values
(if these procedures don’t change your statistics, your stats are robust. cool)
why would we want trimmed/windsorized stuff?
mean and var are especially sensitive to outliers in small samples, which can ruin statistical tests. so we may want indices that are more robust (varies little from one sample to another)
decrease influence of extreme values
why divide by n-1 when computing sample variance?
because dividing by n leads to a biased estimate of variance – the long run average will be too small
also, because we lose one degree of freedom in estimating xbar from the sample (now that we have estimated xbar, not all data points are free to vary - one is fixed)
chief problem in interpreting variance?
it’s in a squared metric and that don’t make no sense, bruh
how is standard deviation interpreted?
it’s in the units of your variable – average difference between an observation and the mean
what is expected value?
long run average of a statistic – if you resample infinity times and compute the statistic, the value that the statistic converges to is its expected value
what is an unbiased estimator?
when the expected value of the sample statistic is the population parameter
true/false: if a statistic is an unbiased estimator of a parameter, the statistic must have a symmetric sampling distribution.
False
how would you go about empirically (without equations) determining the bias and efficiency of sample mean and median in terms of estimating the population mean?
take a bunch of samples from your population and take the mean, median, and sd of each one. then construct a sampling distribution from this iterative procedure. if the mean of the sampling distribution = the population mean, it’s unbiased. if the standard error of the sampling distribution is small, it’s efficient.
define degrees of freedom
degrees of freedom is how many values in your sample are free to vary. e.g., if you calculate the mean, you lose a degree of freedom because now all your observations are not free to vary. one is fixed.
define linear transformations
t = a*x + b
adding or subtracting a constant or multiplying or dividing by a constant
effect of linear transformation on mean, sd, variance, relative ordering, and statistical tests
adding/subtracting: add/subtract same amount from mean (just shifts distribution left or right)
multiplying/dividing: multiplies/divides mean by the constant, variance by the square of the constant, sd by the constant
DO NOT AFFECT relative ordering or results of statistical tests
describe standardization transformation
z = (x - xbar)/s
standardization gives you a distribution with mean = 0 and sd = 1
so tells you how many deviations each value is from mean
to calculate probabilities from this distribution, need normality of data
goal of box-cox (power transformations)
optimize normality of predictors
box-cox considers all possible power transformations and computes likelihod of data under normal distribution…then finds the exponent that makes the data most likely
are all transformations monotonic (order preserving)?
yes, even nonlinear
what do nonlinear transformations preserve/not preserve
preserve order (bc monotonic) but not shape -- changes relative standing of data points so results of statistical tests may not be preserved
nominal scale properties
scale that classifies people
no numeric/cardinal ordering
classifications mutually exclusive
ordinal scale properties
scale that conveys order but no equal distance
interval scale properties
dif between scores has same meaning throughout the scale
no true 0
minimum requirement for most statistics
ratio scale properties
dif between scores has same meaning but now we have a true 0 so we can talk about ratios of scores
when linear transformations are performed on interval scales, do we maintain same ratios?
well it’s no good to talk about ratios in interval scales but if it’s a ratio scale, then yeah you should keep the same ratio after a linear transformation
Why is the normal distribution so important in psychological research?
Because it allows us to compute probabilities of observing a score or test statistic
How can we explore the degree to which sample data are normally distributed?
Graph your data and look at it
Impose a normal density over your data and look at it
Examine mean, sd, skewness, and kurtosis
normal has skew = 0 and kurtosis = 3
How are areas under normal curve linked to probabilities?
p(selecting a case in some range) corresponds to area under the curve between those values
True/false: if a sampling distribution is unbiased, then it must be symmetric and normal as well.
False
Why do we need to know theoretical probability distributions like the normal, chi-square, t, etc.?
because important things like test statistics follow these distributions and we want to know the probability of obtaining a particular test statistic under different assumptions
How does one find the area under a curve?
area under curve from x to y: integrate density function from x to y or use software
Why can’t we compute the exact probability of a sample result instead of a probability for a range of values?
In continuous distributions, the probability of obtaining any one value is 0 so you must look at ranges
What do tabled values of the standard normal distribution actually tell you?
Probability (area under the curve) to the left of whatever the given value is
Under what conditions can we use a z-score table to compute probabilities of a sample result?
If your data are normal or sample size is greater than 30
Distinguish between measures of absolute standing and relative standing.
Uhh I don’t know