Flashcards in Midterm Deck (67):

1

## research

### disciplined inquiry into questions and theories

2

## statistics

### organizing numbers and data

3

## qualitative research

### stats (organizing numbers and data) + disseminating results

4

## wheel of science

### theory > hypothesis > observations > empirical generalizations

5

## descripstive vs inferential statistics

###
descriptive: what is going on in the data? can be bivariate or multivariate

inferential: generalizing data to population

6

## independent & dependent variables

###
independent variables lead to dependent variables

x = what is doing the predicting, y = what is being predicted

7

## discrete vs continuous variables

### whole number measurements vs fractional measurements

8

## nominal vs ordinal vs interval vs ratio

### categories vs ranked variables vs numbers without true zero vs numbers with true zero

9

## percentages and proportions

###
about conceptualizing data

proportions are (f/n), percentages are (f/n)100 where f= frequency and n = number of cases in category

10

## good graphs are....

### theoretically motivated, easy to understand, useful

11

## central tendency

### the most typical/common/central score. describes data, makes certain characteristics easy to understand

12

## mean and median and mode are all the same when...

### the data is a normal curve

13

## dispersion

### how much variation is in the scores? when there is less dispersion, the curve is taller and narrower, and when there is more dispersion, the curve is flater and wider

14

## variation ratio

###
simple measure of statistical dispersion in nominal distributions; it is the simplest measure of qualitative variation.

v = 1 - fm/n, where fm = the number of cases in the mode, and n = total number of cases

i.e. the proportion of cases not in a modal category

15

## determining median in even number of cases

### average of the two middle scores

16

## interquartile range

### the distance between 3rd and 1st quartile i.e. middle 50%

17

## all scores ________ the mean

### all scores cancel out to the mean

18

## mean is the point of ________

### mean is the point of minimized variation

19

## when there is positive skew, x-bar is ____ relative to the median

### when there is positive skew, x-bar is greater than the median

20

## when there is negative skew, x-bar is ____ relative to the median

### when there is negative skew, x-bar is less than the median

21

## when there is no skew, x-bar is ____ relative to the median

### when there is no skew, x-bar is equal relative to the median

22

## when there is a positive skew, the shape of the curve is...

### when there is positive skew, the shape of the curve is stretched out towards the right, with the "lump" being further to the left.

23

## when there is a negative skew, the shape of the curve is..

### when there is a negative skew, the shape of the curve is stretched out towards the left, with the "lump" being further to the right

24

## standard deviation

###
the average distance from the mean

square root of the average difference from the mean squared

25

## box plots

### the box indicates the middle 50%, the lower boundary of the box represents the first quartile (i.e. the point where 25% of the sample lies under) and the upper boundary of the box represents the third quartile (i.e. the point where 75% of the sample lies above). The line through the box indicates the median. The whiskers indicate 1.5xIQR. Outliers are often included.

26

## normal curve

### theoretical, bell shaped, unimodal, symmetrical, mode/mean/median is equal

27

## +/- 1 standard deviation captures __% of the sample

### +/- 1 standard deviation captures 68.26% of the sample

28

## +/- 2 standard deviations captures __% of the sample

### +/- 2 standard deviations captures 95.44% of the sample

29

## +/- 3 standard deviations captures __% of the sample

### +/- 3 standard deviations captures 99.72% of the sample

30

## z-score

### z-score is a position along the normal curve, indicates the number of standard deviations it falls above or below the mean. i.e. z-score of 1 means that the data point is 1 standard deviation above the mean

31

## population and parameter are analogous with...

###
population and parameter are analogous with sample and statistic.

in other words, statistics are characteristics of the sample, and parameters are characteristics of the population

32

## EPSEM

### equal probability of selection method

33

## sampling distribution

###
theoretical concept that links the sample to the population. The sample distribution is normal in shape, and the mean is equal to the population standard deviation/sqrN.

The sampling distribution represents the distribution of the point estimates based on samples of a fixed size from a certain population.

34

## law of large numbers

###
the more samples we have, the closer we get to the normal curve.

The law of large numbers is a principle of probability according to which the frequencies of events with the same likelihood of occurrence even out, given enough trials or instances.

So if you flip 10 coins, you may get 90% heads and 10% tails, but if you flip 100 coins, you're more likely to get closer to 50% heads and 50% tails. The proportion of heads after n flips will almost surely converge to 1/2 as n approaches infinity.

35

## central limit theorem

###
The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed. This will hold true regardless of whether the source population is normal or skewed, provided the sample size is sufficiently large (usually n > 30).

the average of your sample means will be the population mean

36

## standard error

###
standard deviation of the sampling distribution

e.g. plotting the means of 50 samples of 10 would give you a normal curve with a standard deviation

37

## point estimate

###
a single statistic used to infer info about the population

e.g. taking the mean of the heights of a sample of students and inferring the mean of the heights of all students from the sample mean

38

## criteria for choosing estimators

###
-bias: if an estimator is unbiased if the mean of its sampling distribution is equal to the proportion of interest.

-efficiency

39

## z-score for a 95% confidence interval

### 1.96

40

## alpha

###
how certain do you want to be?

e.g. alpha = 0.05 means a confidence level of 95%

every alpha has a z-score associated with it

e.g. alpha = 0.05 has a z-score of 1.96

41

## constructing confidence intervals for means

###
(1) set the alpha

(2) find the z-score associated with that alpha

(3) use formula for confidence intervals with sample means

42

## the bigger the sample the _____ the width of the confidence interval because _______.

### the bigger the sample the smaller the width of the confidence interval because standard error is smaller.

43

## what would you do to increase the confidence interval?

### increase the alpha, e.g. instead of wanting alpha = 0.05 CI 95%, set alpha to 0.01 CI 99%.

44

## confidence interval _____ as confidence level _____.

### confidence interval widens as confidence level increases.

45

## Null hypothesis vs alternative hypothesis

### null hypothesis (H0) always says there is no significant difference. alternative hypothesis (HA) says there is a significant difference. We always assume that the null is true.

46

## what is a hypothesis test?

###
-make a hypothesis

-use z-score formula to determine probability of getting the observed difference: "this difference is statistically different at the alpha = 005 level."

-trying to identify statistically significant differences that didn't occur by chance

47

## 5 step model of hyptohesis testing: one sample case

###
(1) make assumptions -level of measurement is interval ratio, sampling distribution is normal (basically n > 120)

(2) state null hypothesis

(3) select sampling distribution and establish a critical region

(4) compare the test statistic

(5) make decision and interpret the results, either rejecting the null or failing to reject the null

48

## one-tailed vs two-tailed test

###
one-tailed = "significantly less/more" +1.96 or -1.96

two-tailed = "significantly different" +/- 1.96

one-tailed is stronger.

49

## alpha levels affect what in hypothesis testing?

###
critical region

> alpha = < critical region, critical region

e.g. alpha = 0.05, critical region +/- 1.96, alpha = 0.10, critical region =/-1.65

50

## type I error

### rejecting true null hypothesis. aka alpha error. this happens when the thing occurred by random chance but you claimed that it was significantly different. you can avoid type I error by increasing the alpha, e.g. saying you want to be 99% sure instead of 95% sure that something is significantly statistically different.

51

## type II error

### failing to reject false null hypothesis. aka beta error. this happens when the thing was actually significantly different but you claimed that was not statistically different and happened by random chance. you can avoid type II error by decreasing the alpha, e.r. saying you want to be 95% sure instead of 99% sure.

52

## degrees of freedom

### (n-1)

53

## student's t distribution

### used for smaller samples (n < 120) when the population mean is unknown. the student t distribution is shorter and wider than the z-distribution.

54

## two sample test of means for large samples

###
(1) make assumptions - the samples must be independent random sample i.e. mutually exclusive; interval ratio measurements; sampling distribution is normal (basically n > 120)

(2) State the null hypothesis

(3) select sampling distribution and establish critical region

(4) compare test statistic

(5) make decision and interpret results

55

## two sample test of means for small samples

###
(1) make assumptions - the samples must be independent random sample i.e. mutually exclusive; interval ratio measurements; population variances are equal (as long as the 2 samples are approximately the same size, we can make this assumption), sampling distribution is normal (because we're using small samples, we have to add the previous assumption in order to make this one)

(2) State the null hypothesis

(3) select sampling distribution and establish critical region

(4) compare test statistic

(5) make decision and interpret results

56

## two sample test for proportions

###
(1) make assumptions - the samples must be independent random sample i.e. mutually exclusive; nominal measurements; sampling distribution is normal (basically n > 120)

(2) State the null hypothesis

(3) select sampling distribution and establish critical region

(4) compare test statistic

(5) make decision and interpret results

57

## significance vs importance

### differences that are otherwise trivial or uninteresting may be significant. Significance just states whether something is different (is the difference in our sample correct/same as the population?), but it doesn't say if it is an important difference. The substantive importance is up for interpretation

58

## test statistics get ____ as n gets ____.

### test statistics (like p-vlue) get larger as n get larger.

59

## confidence interval vs two sample test

### when you're using the two-sample test, you're taking both estimates of the means and both standard deviations into account. So there is still a possibility of the error bars overlapping but the difference still being statistically different.

60

## what is the variance of a normal curve?

### 1

61

## population values can be estimated with...

### sample values

62

## what is a point estimate?

### the use of sample data to calculate a single value (known as a statistic) which is to serve as a "best guess" or "best estimate" of an unknown (fixed or random) population parameter

63

## which sample statistics are unbiased?

### means and proportions

64

## what is efficiency?

### Basically sample size.

65

## The (larger/smaller) the sample size, the (higher/lower) the value of the standard deviation of the sampling distribution.

### larger, lower

66

## The (larger or smaller) the sample size, the more tightly clustered the sample outcomes will be around the mean of the sampling distribution.

### larger

67