AP exam flashcards

1
Q

interpret standard deviations

A
  • standard deviation accounts for variability from the mean*

height of students typically varied by about 3.2 inches from the mean height of 64 inches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

scope of inference cause and effect

A

cause and effect conclusions can only be drawn if subjects were randomly assigned treatments and we find a statically significant difference

a difference is statistically significant if it is larger than what would be expected to happen by chance alone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

generalizing to a larger population

A

we can generalize and a study to a larger population if we randomly select from that population.

however, sampling variably can affect estimates because if we conduct different samples of the same size from the same population we will produce different estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

replication and control

A

2 out of 4 factors for a good experiment

replication - giving each treatment to enough subjects or units so that any difference in the effect of treatments can be distinguished from chance differences

control - keeping other variables the same for all groups especially variables that are likely to cause confounding(control helps reduce variability in the response variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

experimental units, factors and levels, treatments

A

experimental units - objects for which the treatment is randomly assigned. when the unit is a person, they are often called “subjects”

factor - an explanatory variable that is manipulated and may cause a change in the response variable

level - different values of a factor

all combinations of levels are treatments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

control groups and blinding

A

other 2 factors that contribute to a good experiment

control group - provide a baseline for comparing the effects of other treatments. A control group is often given an inactive treatment(placebo), active treatment, or no treatment

blind - when the subject doesn’t know which treatment they are receiving. the people recording or measuring the response variable don’t know they are blind. when both groups don’t know it is called “double-blind”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

blocking and matched pairs design

A

before random assignment divide the experimental units into groups that would respond similarly. then randomly assign treatments within blocks.

a matched pairs design uses blocks of size 2 or gives both treatments to each subject in random order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

random assignment and completely randomized designs

A

random assignment - create groups of experimental units that are roughly equivalent at the beginning of the experiment

if treatments are assigned to experimental units completely at random(no blocking), the result is a completely randomized design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

simple random sample

A

of size n is chosen so that every group of n individuals in the population has an equal chance to be selected as the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

bias

A

a statistical study shows bias if it is very likely to underestimate or overestimate the value you want to know

samples that can result in bias - convenience, voluntary, under coverage, non-response, and response bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

using a random table to select a sample

A

label all members of the population with the same number of digits
randomize and read the digits from left to right skipping any repeated numbers or numbers not in the interval or numbers
selects the individuals whose labels you find

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

choosing a model

A

choose the model whose residual plot has the most random scatter

if there is more than one model with a random scattered residual plot, choose the model with the largest coefficient of determinations, r2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

population, census, sample

A

the population in a statistical study is the entire group of individuals we want information about

census collects information from every single person within the population

a sample is a subset of individuals from the population from which we collect data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

experimental vs observational study

A

experimental study - researchers impose treatment(s) upon the experimental units. well designed experiments allow for cause-and-effect conclusions to be made

observational study - does not influence variables and the results cannot conclude cause and effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is a chi square distribution

A

a chi square distribution is defined by a density curve that takes only nonnegative values and is skewed to the right

as df increases the chi square distributions become more variable, less skewed and centered at a larger value (mean = df)

the chi square test statistic measures how different the observed counts are from the expected counts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

inference for regression

A

Liner - association between variables is linear
Independent - observations, 10% condition if sampling without replacement
Normal - responses vary normally around the regression line for all x-values (or n > 30)
Equal SD - around the regression line for all x-values
Random - data from a random sample or randomized experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

outlier rule

A

outliers > Q3 + 1.5(IQR)
outliers < Q1 - 1.5(IQR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is a resistant measure

A

a reassure measure is not affected by outliers

resistant measures: median, IQR, Q1, Q3

non resistant: mean, SF, range correlation, equation of LSRL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Interpret a Z-score

A

“Jessica;s test score was 2.3 standard deviations below the mean”
z = -2.3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

z - score formula

A

z = value - mean/standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

interpret standard deviation of residuals s

A

s measures the size of the typical residual

“The cost of a car typically varies by about $2375 from the price predicted by the LSRL with x = years”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

residual formula

A

actual - predicted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

interpreting a residual plot

A
  • if there is no leftover curvature the model used to make the plot is appropriate
  • if there is leftover curvature the model used to make the plot is not appropriate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

making predictions/extrapolation

A

extrapolation is the use of a LSRL for prediction outside of the interval. The further we extrapolate the less reliable predictions

25
interpret slope and y intercept
slope - "The predicted cost of a car decreases by about $1285 for each additional year" slope - the change in y when x increases by one unit y intercept - "The predicted cost of a car is about $23,450 when it is x = 0 years old" y intercept - the predcited value of y when x is 0
26
interpret a residual
" the car cost $1500 more than the price predicted by the LSRL with x = years"
27
working with a power model
28
interpret coefficient of determination(r2)
r2 measures the percent of variability in y that is accounted for by the LSRL of y on x "48% of variability in the cost of a car is accounted for by the LSRL with x = years"
29
cluster sampling
split the population into groups(based on location) called cluster, randomly selefct cluster and include each member of the selected clusters in the sample
30
confounding
two variables are associated in such a way that their effects on the response variable cannot be distinguished
31
systematic random sampling
selected a sample from an ordered arrangement of the population by random selecting one of the first k individuals choosing every kth individual thereafter k =
32
stratified random sampling
split the population into homogeneous(similar) groups(strata) based on anticipate response. selected an srs from each stratum and combine the srss to form the overall sample
33
outliers, high leverage, and influential points in regression
high leverage - a point with much larger or much smaller x values than the other points outliers - a point that does not follow the pattern of the data and has a much larger residual(actual - predicted) influential point - a point that if removed substantially changes the slope, y-intercept, correlation, r2, or standard deviation of the residuals high leverage points and outliers can both be influential
34
how does shape affect measures of center
mean < median (Left Skew) mean > median (Right Skew) mean = median (Roughly Symmetric)
35
association
two variables have an association if knowing the value of one variable helps to predict the value of the other variable
36
discrete vs continuous variables
a quantitative variable is discrete if its possible values have gaps between them. ie (1, 2, 3, 4) a quantitative variable is continuous if its possible values have no gaps between them and can take any value on the number line. ie(1, 1.1, 1.2, 1.3 ... 1.7)
37
interpret r
correlation measures strength and direction r is always between -1 and 1 close to zero = very weak close to 1 or -1 = strong exactly 1 or -1 = perfectly straight line positive r = positive correlation negative r = negative correlation
38
finding boundaries under a normal distribution
use invNorm and label inputs *empirical rule*
39
finding area under a normal distribution
use normalcdf
40
standard normal distribution
the area of a normal distribution will always be 0 and SD 1
41
describing/comparing distributions of quantitative data
use SOCV Shape Outliers Center Variability
42
parameter vs statistic
a parameter is always about a population a statistic is always about a sample parameters include the population mean, population standard deviation, population proportion statistics include the sample mean, sample standard deviation, sample proportion
43
marginal, joint, and conditional relative frequency
marginal - the values on the edge of the 2-way table joint - the values that make up the body of the table conditional - the joint frequency/condition ex: the probability that a survey respondent likes basketball the most, given that the respondent is male. 15(males who like basketball)/48(males because that's the condition)
44
percentiles
the pth percentile of a distribution is the value that has p% of the observations less than or equal to that value example: a student who scores at the 90th percentile got the same score or a greater score than 90% of the other test takers
45
describing an association in a scatterplot
use DUFS to describe association in a scatterplot Direction - positive, negative, no association Unusual features - clusters, other points Form - linear, nonlinear Strength - weak, moderate, strong "There is a moderate, positive, linear association between height and weight for HS students"
46
empirical rule
if a distribution of data is approximately normal then, * 68% of the data will be within 1 SD of the mean * about 95% of the data will be within 2 SD of the mean * about 99.7% of the data will be within 3 SD of the mean
47
transforming data/ effect of changing units
adding "a" to every member of a data set adds "a" to the measures of center/position but does not change the measures of variability or shape multiplying every member of a data set by a positive constant "b" multiplies the measures of center/position by "b" and multiplies most measures of variability by "b", but does not change shape
48
density
a density curve models the distribution of a quantitative variable with a curve that is always on or above the horizontal axis and has an area exactly 1 underneath the area under the curve and above any interval of values on the horizontal axis estimates the proportion of all observations that fall in that interval
49
50
51
52
53
54
55
56
57
58