Module 1: Study Design and Inference Flashcards

Question

Benefits and Cons: Systematic Sampling

Answer 1

This sampling methods provides even sampling coverage of the population. And can be an alternative when randomisation is impossible. There is a heavy reliance on assumptions that units are independent and random, relative to spacing. If these assumption fail, then design can be horribly wrong.

Answer 2

Stratified R- is a more precise estimate of the population than Simple R- for the same sample size. Cluster is typically less precise than either for the same sample size.

Answer 3

Selecting units close at hand, e.g. surveying students in front of Business building.

Answer 4

This sampling method is cheap, efficient, and easy to implement. But, involves no probabilistic sampling scheme. And can be heavily bias.

Answer 5

Without it, you can't use design-based justification for extrapolation from sample to population. Because extrapolation is based on untestable assumptions.

Answer 6

Using the correct study design allows... - Extrapolation justified by study design - More robust results - Identifying issues that may arise during sampling and analysis, e.g. pseudo-replication. - reduce mismatch between sample and population.

Answer 7

when the sample is not representative of the population, therefore generalisations may be inaccurate.

Answer 8

When the population of interest is not clearly defined, no probabilistic sampling scheme is applied, or voluntary/non-response bias occurs.

Answer 9

Occurs when selected units systematically differ from the population of interest. This can be exist as sampling bias, non-response bias, or voluntary bias.

Answer 10

When there is a mismatch that is not justified or accounted for in analysis and interpretation.

Answer 11

When a particular subset of the population is less likely to respond.

Answer 12

When certain participants self-select their involvement.

Answer 13

When the measurement of units is systematically bias or incorrect, e.g. poorly calibrated instruments, leading questions, when true values are not observed (other option in surveys).

Answer 14

Post-hoc stratification of respondents (categorise) or explaining variation with covariates with linear regression. These changes leads to more model-based inference.

Answer 15

Randomisation and Replication

Answer 16

- Known variation can be controlled by matching or grouping (stratification, paired studies, etc). - Unknown variation ('random error') can be reduced by increasing replication. - Increasing 'signal to noise ratio' through study design.

Answer 17

Replication establishes valid experimental results by ensuring, reproducibility, robustness against aberrant results, experimental error (uncertainty). Also makes results more precise because, as replication increases uncertainty decreases.

Answer 18

The distribution of a statistic if we could repetitively sample from the population. This is a theoretical property of statistics to make inference.

Answer 19

The application of the methods of probabilities to the analysis and interpretation of data. In inference we often infer properties of an unknown probability distribution using collected data.

Answer 20

The data will have certain characteristics and properties (dependant on the model used). Therefore, we can prove best fit by running diagnostic tests.

Answer 21

An approximation of reality that describes the data generating process. When we use them we are interested in the model parameters, as they often show underlying variance.

Answer 22

y = β0 + β1*x + ε ε ~ Norm(0,σ2) The first line explains the μ as a line. The second line line explains variability around the line.

Answer 23

Higher variance will make the curve shorter and more spread. Lower variance will make the curve taller and narrower.

Answer 24

Testing whether our observed data occurrence is consistent with our given hypothesis.

Answer 25

s (sample standard deviation) and t distribution instead of a normal distribution.

Answer 26

Internal estimate - 95% confidence interval. Point estimate - 0% confidence interval.

Answer 27

Ordinary Least Squares and Maximum Likelihood

Answer 28

Find estimates that minimise the total square difference between observed and expected values.

Answer 29

Finds estimates that maximise the likelihood of observing (in the data) the weight we measured in our model. (most commonly used)

Answer 30

Because the simple t-test ignores the sampling scheme, treating all the cockles as independent, there are two flaws in our interpretation. Flaw 1: incorrect analysis (analysing cockles instead of quadrates) causing pseudo-replication. Flaw 2: ignoring controllable source of uncertainty (quadrate to quadrate variation). With the correct analysis we would see the CI narrow, because more uncertainty is controlled by accounting for quadrate to quadrate variation.

Module 1: Study Design and Inference Flashcards

(54 cards)