data collection Flashcards
(75 cards)
what is data collection
Data collection is the systematic process of gathering and measuring information from various sources to answer research questions, test hypotheses, and make inferences.
what is a sampling unit
Sampling unit: an individual object, animal, or person, on which measurements can be made.
what is a target population
The target population is the entire group of sampling units we want to study or make inferences about.
what is a census
A census measures every individual in the target population. It is often an official survey conducted by governments to gather demographic data.
3 advantages of a census
Provides complete and accurate data.
No sampling error since everyone is included.
Useful for policy-making and resource allocation.
What are the disadvantages of a census?
Expensive and time-consuming.
Difficult to access the entire population.
Data may become outdated by the time analysis is complete.
what is a sampling protocol or design
The procedure or strategy used to select sampling units from the target population.
what is a sample
A subset of individuals or sampling units selected from a target population for analysis to estimate parameters or test hypotheses.
what is a variable
A variable is a characteristic of each sampling unit that is measured (e.g., age, blood group, voting preference), usually denoted by lowercase Roman letters (e.g.𝑥, 𝑦).
what is a parameter
A parameter is a numerical summary of a variable for a population, usually represented by Greek letters (e.g. 𝜇 for the true mean)
what is a statistic/estimate
A statistic (or estimate) is a numerical summary of a variable for a sample, often used to estimate a population parameter (e.g. 𝑥ˉ estimates 𝜇)
4 data collection methods
Censuses – measuring the entire target population.
Polls and surveys – collecting responses from a sample.
Randomized designed experiments – manipulating variables under controlled conditions.
Observational studies – collecting data without intervention.
what is a survey
A survey is the process of collecting data from a sample in order to obtain information about the whole population.
what is an opinion poll
An opinion poll assesses public opinion by questioning a random or representative sample. Often used for election forecasting.
Why use a survey instead of a census? (3)
- Cheaper
- Faster
- More practical (accessing the entire population may be difficult or impossible)
what is sampling error
This variation between samples is called sampling error and it is unavoidable without taking a census.
Why is random sampling important? (4)
Gives each member of the population an equal chance of selection.
Reduces bias.
Allows calculation of sampling error.
Larger samples improve representativeness.
- more representative sample
What are accuracy, precision, and bias in statistics?
Accurate – Sample statistic is similar to the population parameter.
Precise – Statistic is consistent across multiple samples. A lack of precision may arise from sampling error e.g. where sample sizes are very small.
Biased – implies that the sample statistic tends to differ from the population parameter in a consistent way (there is a systematic error)
what is the goal in sampling
To select a sample that reflects the variation in the whole population without sampling the entire population.
Why is careful sampling important?
Poor data collection can lead to flawed conclusions.
A well-chosen sample allows for accurate and robust decisions.
Uncertainty is inherent in sampling, so methods must minimize errors.
3 different sample strategies
Simple random sampling
Systematic random sampling
Stratified random sampling
what is simple random sampling
A method where each individual in the population has an equal chance of being selected.
What is the formula for the probability of selection in simple random sampling?
The chance, or probability, of being selected in a sample of size 𝑛 from a population of size 𝑁 is:
chance of selection=𝑛/𝑁
What is the probability of a student being selected from a sample of 20 students from a class of 130
20/130 = 0.1538 = 15.38%