Module 1: Study Design and Inference Flashcards
(54 cards)
Statistics Definition (Summarised)
Science of learning data, and how to control and communicate uncertainty.
This involves study design, data collection, analysis, and interpretation; for the purpose of drawing conclusions and presenting uncertainty.
Scope of Inference
Whether the results from the sample can be generalised to the broader population.
“What group items do we want our conclusion to be valid for?”
Study Types
Survey, Experimental, Observations
Goal of Surveys Studies
Sampling a population for the purpose of describing a populations characteristics.
Survey Studies, the ‘Researchers Role’
The researcher has control over the collection method of the sample.
Goal of Experimental Studies
Establish a causal relationship, by assigning treatments to experimental units.
Experimental Studies, the ‘Researchers Role’
The experimenter has control over treatment assignment.
Goal of Observational Studies
Investigate relationships (associations) between variables, as they occur in nature.
Observational Studies, the ‘Researchers Role’
The researcher has control over what data to include, and how to investigate data.
Ways to reduce Noise/”haphazard variation”:
- clever design that yields preciser results for the same cost/sample size.
- measuring covariates, to explain more variation.
- increase sample size.
Ways to avoid bias:
- appropriate study design, analysis, and interpretation.
- Collecting a sample that’s representative of the population.
Population of Inference is defined by…
- Scope of Inference
- Sampling Frame
- And all items within the PoI follow a Probabilistic Sampling Scheme.
Probabilistic Sampling Scheme Definition
each sample unit in a population has a definable probability of being included in that sample.
Design-based Inference vs. Model-based Inference
Design-based inference relies on randomly assigning some population to the sample. (the sample being representative of population).
While, Model-based Inference relies on the distributional assumptions. (normal vs non-normal).
Sampling Types
- Simple Random Sampling
- Stratified Random Sampling
- Cluster Sampling (I vs II)
- Systematic Sampling
- Convenience / Ad-Hoc Sampling
Simple-Random Sampling
All possible items (including subgroups) have the same chance of being selected.
Benefits and Cons:
Simple-Random Sampling
This sampling method is cheap and easy to implement.
If the population is too large this sampling method may not be truly representative (mis-match), potentially leading to inaccuracies.
Stratified-Random Sampling
In the sample, the size of subgroups are proportional to in population.
Benefits and Cons:
Stratified-Random Sampling
The sample will be proportionally representative of the population. And can over-sample smaller strata to get strata specific estimates.
There is a potential for misclassification. And it may not be possible to identify every subgroup.
Cluster Sampling
When the population may be comprised of similar and naturally occurring groups.
There are two types, single-stage and two-stage.
Benefits and Cons:
Cluster Sampling
This sampling method is cheaper and can be easier to study clusters than individuals, e.g. a school’s avg vs student avg.
But less precise and can have multiple levels of variation.
Single-Stage Cluster Sampling
Every unit within a cluster is sampled, e.g. everyone in the household.
Two-Stage Cluster Sampling
Random selection of a unit within a cluster, e.g. one member within the household.
Systematic Sampling
Selection of each unit is consistently spaced (interspersion), e.g., select every 4th entry on a list.