Midterm Flashcards

1
Q

Disjoint (mutually exclusive)

A
  • events cannot happen at the same time
    ex) rolling a die
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Independent

A
  • one event does not affect the probability of the other
    ex) tossing 2 coins
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Disjoint addition rule

A

P(AUB) = P(A) + P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

proving independence

A

if P(A)*P(B) = P(A and B), A and B are independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

conditional probability

A

P(A|B) = P(A and B) / P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Multiplication Rule

A

P(A and B) = P(A) * P(B|A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Bad samples

A
  1. convenience samples
    - introduces bias
    - over/under estimation
    - x reflect population
  2. bias
    - systematically favoring certain outcome
    - under/over estimation
  3. voluntary response
    - people who choose to answer a general appeal
    - people with strong emotion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Good samples

A
  1. simple random sample
    - ensures everyone has an equal chance of selection
    - preferred when there are smaller data sets.
  2. stratified random sample
    - ensures all subgroups are represented
    - more precise
  3. cluster
    - create cluster by location
    - save money and time
  4. systematic random sample
    - randomly select k individual and count every kth individual
    - preferred when population is ordered
    - easier to conduct
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What can go wrong?

A
  1. undercoverage
    - members of the population have less of a chance of being chosen or left out
  2. nonresponse
    - chosen individuals cannot be contacted or refuse to participate –> big issue
  3. response bias
    - individuals lie or answer a question they don’t know
  4. question wording bias
    - they way a question is worded or asked influences the response from an individual
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Observational study vs. experiment

A

observational study
- x treatment

experiment
- treatment (need experiment to know causation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

confounding variable

A

other possible variables other than explanatory variable that affects response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

4 principles of experimental design

A
  1. random assignment
  2. replication
  3. control
  4. comparison
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

placebo effect

A

dummy treatment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

purpose of control group

A

provides a baseline for comparison

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

purpose of single and double blind experiments

A

reduce placebo and favoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

purpose of random assignment

A
  1. creates roughly equivalent groups
  2. helps control confounding variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

purpose of replication

A

use enough subjects/experimental units so the outcome of the experiment can have meaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

statistically significant

A
  • the results most likely did not happen by chance.
  • convincing evidence with simulation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

blocking

A
  • stratified sampling in experiments
  • blocks should be homogeneous
  • use a confounding variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

purpose of blocking

A
  1. controls confounding variables
  2. increases chance of finding convincing evidence if the effect is real
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

matched pairs design

A

blocks only include 2 experimental units
1. 2 similar units
2. 1 unit that gets both treatment

(randomly assign treatments or randomly assign the order of treatment)

22
Q

interference

A

using information from our sample/experiment to draw conclusions about the population

23
Q

sampling variability

A

different samples from the same population will give different results

24
Q

categorical graphs

A

side by side bar graph
segmented bar graph
mosaic plot
pie chart

25
quantitative graphs
histogram stem and leaf plot dot plot box plot
26
marginal relative frequency
prob of margins. int: proportion of all who are _________.
27
joint relative frequency
prob of inner over total int: the proportion of all who are ________ and ________.
28
conditional relative frequency
prob of inner over small total int: the proportion of _______s who are _________.
29
describe a distribution
Shape Outlier Center Spread + context
30
resistant measure
yes - median - IQR no - mean - range - standard deviation - variance
31
problem of range
ignores all values in the data set except the max and the min non-resistant to outliers
32
standard deviation
how far, on average, the values of the distribution are from the mean - greater than 0 (s=0 => no variability) - large value = more variability - not resistant to outliers - measures variance about the mean int: the (context) differes by (s.d. unit) from mean (mean), on average.
33
+/- of boxplot
+ -- can know how spread the data are - -- can't see peaks/gaps in the data
34
comparison of symmetric and skewed graph
symmetric (no outliers) - center: mean - spread: standard deviation skewed (yes outliers) - center: median - spread: IQR
35
transforming data
linear transformation --> if add/subtract - mean = add/subtract - standard deviation = same - shape = same --> if multiply/divide - mean = multiply/divide - standard deviation = multiply/divide - shape = same
36
density curve
a curve that - is always on or above the horizontal axis - has area exactly 1 underneath it - describe the overall pattern of a distribution
37
normal curve
1. symmetric and bell shaped 2. the mean = median, both located at exact center
38
empirical rule
- 68-95-99.7 rule
39
describing scatterplot
Direction Unusual features Form Strength + context
40
correlation
measures strength and direction of a linear association - indicates direction by sign - both variables need to be quantitative - does not rely on units of measure - has no unit of measurement - does not measure form - only for linear relationship - not resistant measure of strength int: the linear association between (x-context) and (y-context) is (strength) and (direction)
41
extrapolation
using regression line to predict outside the x-values that were used to calculate the line
42
residual
actual y - predicted y + = underprediction - = overprediction int: the actual (y-context) was (residual) (above/below) the predicted value when (x-context = #)
43
least squares regression line
the line that makes the sum of the squared residuals as small as possible - distinction between x and y is essential - r and slope has same sign - not resistant to unusual point
44
residual plot
a scatterplot that shows the residual as the y-value and the explanatory as x-value if it doesn't have pattern, the regression model is good.
45
standard deviation of residuals (s)
measures the typical distance between the actual y and predicted y int: the actual (y-context) is typically about (s) away from the value predicted by the LSRL.
46
coefficient of determination (r^2)
measures the percent reduction in the sum of squared residuals when using the LSRL to make prediction, instead of just using the mean of the y-values int: about (r^2 %) of the variation in (y-context) can be explained by the linear relationship with (x-context)
47
outlier, high leverage, and influential points
outlier - pt that does not follow the pattern of data (has large residual) high leverage - much larger/smaller x-values than other pts influential points - any pt that, if removed, substantially chances {slope, y-int, r, r^2, s}
48
expected value (mean) of discrete random variable
int: if the random process of (context) is repeated many, many times, the average number of (x-context) we can expect is (expected value).
49
standard deviation of a discrete random variable
int: the (context) typically vary by (standard deviation) from the mean of (mean)
50
adding/subtracting random variables X and Y
mean = mu x +/- mu y standard deviation = sqrt (mu x^2 + mu y^2) --> X and Y must be independent random variables