Midterm 1: Ch 1-10 Flashcards

Question

Do smaller or larger samples have smaller sampling error?

Answer 1

larger samples → smaller sampling error | on average

Answer 2

every new measurement is different each time we do it low precision – large difference

Answer 3

- higher precision – small differences | - low variance between different estimates (each time we do a study)

Answer 4

- categorical variables (class or nominal variables) | - numerical variables (quantitative variables)

Answer 5

fall into categories

Answer 6

continuous: can be measured – ie. arm length, height, weight, age* discrete: can be counted – ie. number of limbs, number of offspring, number of petals

Answer 7

frequency is NOT a variable – not measuring, just gathering data

Answer 8

- histogram | - cumulative frequency distribution (CDF)

Answer 9

continuous numerical variable - no gaps between bars – conveys that these are continuous variables running together - widths are the same

Answer 10

proportion of individuals equal to or less than that value - 0 = none of the individuals are less than that value - 1 = all individuals are less than that value

Answer 11

describes association between two (or more) categorical variables by displaying frequencies of all combinations of categories

Answer 12

- contingency table - grouped bar graph - mosaic plot

Answer 13

relative frequencies scaled to 1 – does NOT use discrete numebrs width of bars indicates number of individuals in the treatment

Answer 14

discrete numbers or frequency

Answer 15

- multiple histogram - cumulative frequency distribution (CDF) - box plot

Answer 16

scatter plot`

Answer 17

- location: central tendency | - width: spread – how variable the data is

Answer 18

- mean - median - mode

Answer 19

add all numbers together and divide by total amount of data points – centre of gravity

Answer 20

odd number: middle measurement in a set of ordered data even number: average of two middle numbers in a set of ordered data

Answer 21

most frequent measurement

Answer 22

skewed data – lot of the weight is on one side of the distribution

Answer 23

symmetrical distribution of data – bell-shaped

Answer 24

- mean has nice statistical properties, can be quantified easily using theories - mean has good predictive behaviours

Answer 25

- range - variance - standard deviation - coefficient of variation

Answer 26

maximum minus minimum - poor measure of distribution width – useless in statistics

Answer 27

yes, smaller sample → lower estimates of range - sample range is not expected to match population range

Answer 28

if we took unsquared value, negative and positive deviations cancel out

Answer 29

unbiased estimator of population variance – used to try to learn about population variance

Answer 30

positive square root of the variance σ: true standard deviation s: sample standard deviation – unbiased estimator of population standard deviation

Answer 31

good for comparing distributions of different magnitudes

Answer 32

measurement of asymmetry – refers to pointy tail of distribution right-skewed: pointy tail is on the right left-skewed: pointy tail is on the left

Answer 33

population parameter: µ sample statistic: Ȳ

Answer 34

population parameter: σ^2 sample statistic: s^2

Answer 35

population parameter: σ sample statistic: s

Answer 36

E[X + Y] = E[X] + E[Y]

Answer 37

E[X + c] = E[X] + c ie. temperature conversions

Answer 38

E[c X] = c E[X] ie. measurement conversions

Answer 39

Var[X + Y] = Var[X] + Var[Y] ONLY if X and Y are independent

Answer 40

Var[X + c] = Var[X] spread of data has not changed – variance is the same ie. adding 10 cm to every measurement

Answer 41

Var[c X] = c^2 Var[X] variance in units^2, therefore multiply by constant^2

Answer 42

every sample will look different

Answer 43

samples will look similar to each other

Answer 44

larger sample size = smaller variance of the sampling distribution of the mean

Answer 45

standard deviation of its sampling distribution predicts the sampling error of the estimate

Answer 46

in most cases, we don’t know 𝜎 – we only have a sample

Answer 47

gives some knowledge of the likely difference between sample mean and true population mean

Answer 48

provides a plausible range for a parameter - all values for the parameter within the interval are plausible - all values for the parameter outside the interval are unlikely

Answer 49

interval that provides a rough estimate of 95% CI for the mean assuming normally distributed population and/or sufficiently large sample size

Answer 50

error that occurs when samples are not independent, but they are treated as though they are ie. taking multiple measurements from one individual and using each as an individual of the sample EXAMPLE: - taking 10 measurements from each climber (6) to get 60 measurements - to avoid pseudoreplication: take mean blood pressure for each climber, so that you have 6 pulse rates, one for each climber (n = 6)

Answer 51

its true relative frequency – proportion of times event would occur if we repeated same process over and over again

Answer 52

when two events cannot both be true Pr(A and B) = 0

Answer 53

when the occurrence of one event gives no information about whether the second event will occur

Answer 54

describes the true relative frequency of all possible values of a random variable all probabilities have to sum to 1

Answer 55

if two events A and B are mutually exclusive Pr[A or B] = Pr[A] + Pr[B]

Answer 56

Pr[number ≥ 6] = Pr[6] + Pr[7] + Pr[8]...

Answer 57

Pr[not rolling a 2] = 1 – Pr[rolling a 2] = 5/6

Answer 58

Pr[A or B] = Pr[A] + Pr[B] - Pr[A and B] need to subtract Pr[A and B], otherwise it’ll be counted twice

Answer 59

if two events A and B are independent Pr[A and B] = Pr[A] x Pr[B]

Answer 60

Pr[A and B] = Pr[A] Pr[B | A] Pr[A and B] = Pr[B] Pr[A | B] therefore, Pr[A] Pr[B | A] = Pr[B] Pr[A | B]

Answer 61

probability of one event depends on the outcome of another event

Answer 62

probability of that event occurring given that a condition is met Pr[X|Y] probability of X given Y (if Y is true)

Answer 63

when you want to flip conditional probability

Answer 64

asks how unusual it is to get data that differ from the null hypothesis - if the data would be quite unlikely under H0, we reject H0 - assumes random sampling - about populations, but are tested with data from samples

Answer 65

specific statement about a population parameter made for the purposes of argument - simplest statement - specific - good H0 would be interesting if proven wrong

Answer 66

represents all other possible parameter values except that stated in the null hypothesis - statement of greatest interest - non-specific

Answer 67

population → sample → estimate → test statistic null hypothesis → test statistic null hypothesis → construct a new population under H0 → imagined repeated sampling → sample from H0 and calculate test statistic → null distribution of test statistic test statistic + null distribution of test statistic - how weird would these data be if the null hypothesis were true? - compare distribution from H0 to observed sample - how likely would it be to obtain our data sample if H0 were true?

Answer 68

number calculated to represent/summarize the match between a set of data and the null hypothesis can be compared to a general distribution to infer probability – for any given value for a test statistic, we can say how likely those possible outcomes are

Answer 69

probability distribution of alternative outcomes for a test statistic when a random sample is taken from a population corresponding to the null expectation

Answer 70

yes need to evaluate range and distribution of possible test statistics we have sampled, if we sampled repeatedly

Answer 71

probability of getting the data, or something as or more unusual/extreme, if the null hypothesis were true NOT probability H0 is true NOT probability HA is true

Answer 72

- simulation - parametric tests - permutation

Answer 73

acceptable probability of rejecting a true null hypothesis 𝜶 = usually 0.05

Answer 74

if p-value for a test is ≤ 𝜶, then H0 is rejected

Answer 75

if p-value for a test is > 𝜶, then H0 is NOT rejected

Answer 76

larger sample → estimate has smaller confidence interval larger sample → more power to reject a false null hypothesis

Answer 77

rejecting a true null hypothesis probability of Type I error is 𝜶 (significance level)

Answer 78

not rejecting a false null hypothesis probability of Type II error is 𝜷 - what the real world looks like - our sample size - our 𝜶 smaller 𝜷 = larger power a test has

Answer 79

ability of a test to reject a false null hypothesis – how likely we will reject it 1 – 𝜷

Answer 80

larger power = larger sample size (more information) increase sample size → decrease standard deviation of null distribution → increase power to reject H0

Answer 81

deviation in either direction would reject the null hypothesis - most tests are two-tailed - normally 𝜶 is divided into 𝜶/2 on one side, and 𝜶/2 on the other

Answer 82

only used when the other tail is nonsensical ie. comparing grades on multiple choice test to that expected by random guessing

Answer 83

value of a test statistic beyond which the null hypothesis can be rejected - we never ‘accept the null hypothesis’

Answer 84

in general, if a hypothesis test rejects a null hypothesis test (p < 0.05), the value proposed by the null hypothesis is outside the 95% confidence interval

Answer 85

(when taking into account sample spread and size, and assuming we’ve randomly sampled), the less likely it is they were drawn from populations with the same mean

Answer 86

there is a 3% chance of getting means that are at least this different if they’re drawn from populations with the same mean

Answer 87

higher p-values: - higher probability of 2 sample means being at least this different, if drawn from populations with same mean - less evidence of differences between population means lower p-values: - lower probability of 2 sample means being at least this different, if drawn from populations with same mean - more evidence of differences between population means

Answer 88

unmeasured variable that may be the cause of both X and Y

Answer 89

fraction of individuals having a particular attribute

Answer 90

describes the probability of a given number of ‘successes’ from a fixed number of independent trials

Answer 91

- mean of number of successes | - variance of number of successes

Answer 92

number of ‘successes’ over total sample size

Answer 93

- mean | - variance

Answer 94

larger sample → lower standard error

Answer 95

Agresti-Coull confidence interval

Answer 96

larger sample → more symmetrical distribution

Answer 97

‘fudge factors’ that are there for more asymmetrical distributions

Answer 98

anything that can go wrong will go wrong

Answer 99

uses data to test whether a population proportion (p) matches a null expectation for the proportion H0: relative frequency of successes in the population is p0 HA: relative frequency of successes in the population is not p0

Answer 100

compares count data to a probability distribution (expected frequencies) of a set of categories

Answer 101

H0: data come from a particular probability distribution HA: data do NOT come from that distribution

Answer 102

the more categories you have, the more opportunities to deviate from expectations

Answer 103

specifies which of a family of distributions to use

Answer 104

df = (number of categories) - (number of parameters estimated from the data) - 1

Answer 105

value of the test statistic where P = 𝛼 if observed 𝜒2 > 𝜒2 corrected for df, we reject the null hypothesis if observed 𝜒2 < 𝜒2 corrected for df, we DO NOT reject the null hypothesis

Answer 106

number calculated from the data and the null hypothesis that can be compared to a standard distribution to find the P-value of the test

Answer 107

yes, because it works even when there are only two categories - very useful if the number of data points is large - BUT not recommended if binomial test is possible – two categories (success and failure)

Answer 108

- no more than 20% of categories have Expected < 5 - no category with Expected ≤ 1 (if needed, combine categories to satisfy these requirements)

Answer 109

probability distribution describing a discrete numerical random variable ie. number of heads from 10 flips of a coin ie. number of flowers in a square meter ie. number of disease outbreaks in a year

Answer 110

describes the probability that a certain number of events occur in a block of time or space, when those events happen independently of each other and occur with equal probability at every point in time or space used to ask questions about random events (by chance)

Answer 111

test the independence of two or more categorical variables

Answer 112

df = (# of columns - 1) (# of rows - 1)

Answer 113

this test is just a special case of the 𝜒2 goodness-of-fit test, therefore the same rules apply - no more than 20% of categories have Expected < 5 - no category with Expected ≤ 1

Answer 114

for 2 x 2 contingency analysis - does not make assumptions about the size of expectations R (or other programs) will do it, but difficult to do by hand

Answer 115

probability of success divided by the probability of failure

Answer 116

odds of success in one group divided by the odds of success in another group OR < 1 means treatment helps OR > 1 means treatment makes things worse

Answer 117

- distribution fully described by its mean and standard deviation - symmetric around its mean - mean, median, and mode are all the same - 67% of random draws from a normal distribution are within one standard deviation of the mean - 95% of random draws from a normal distribution are within two (1.96) standard deviations of the mean

Answer 118

mean (μ) = 0

Answer 119

standard deviation (σ) = 1

Answer 120

gives probability of getting a random draw from a standard normal distribution greater than a given value

Answer 121

yes Pr[Z > x] = Pr[Z < -x]

Answer 122

1 Pr[Z < x] = 1 – Pr[Z > x]

Answer 123

yes, just with different means and variances

Answer 124

yes, by Z: standard normal deviate - Z tells us how many standard deviations Y is from the mean - probability of getting a value > Y is the same as probability of getting a value > Z from a standard normal distribution

Answer 125

yes, if the variable itself is normally distributed - mean of the sample means - standard deviation of the sample means

Answer 126

the standard deviation of the distribution of sample means

Answer 127

sum or mean of a large number of measurements randomly sampled from any population is approximately normally distributed

Answer 128

failing to reject H0 does not mean H0 is correct, because the power of the test might be limited null hypothesis is the default and is either rejected or not rejected

Midterm 1: Ch 1-10 Flashcards

(159 cards)