Lecture 5 Flashcards
(41 cards)
In data sets rows capture
Obersvations (on e.g. consumers or firms)
Columns display
Variables. A variable can take on different values for different subjects
Dummy variables
Variables that only take on the values 0 or 1
codebook
A list of all the codes used in a dataset
The variables in your data set need to match the unit of analysis in a study. Specifically:
The dependent variable is measured at the level of the unit of analysis. So are mediator vairables
Independent and moderator vairables are measured at the level of the unit of analysis or at a more aggregate level
Population
Entire group of people, firms, events, or things of interest for which you would like to make inferences
Sample
A subset of the population of interest
Why use samples in the first place
Impossible to study the entire population
The sampling process consists of the following steps
1) define the population you are interested in
2) Determine the sampling frame. The sampling frame is the physical representation of the pupulation through which one can reach out to that population
3) Decide on the sampling design
How to define the target population and choose the sampling frame
1) Define the target population: (Students at tisem, employees at philips etc.)
2) Determine the sampling frame
-Physical representation of the target population (Examples: students at Tisem –> Database students TiSEM)
3) Determine the sampling design
Coverage error
Sampling frame =/ population
Under coverage
Ture population members are excluded
Miss-coverage
Non population members are included
Solution to coverage error
If small, recognize but ignore
If large, redefine the population in terms of the sampling frame
Probability sampling
Each element of the population has a known chance of being selected as a subject
Results generalizable to population
More time and resource intensive
Nonprobability sampling
The elements of the population do not have a known chance of being selected as a subject
Less time and resource intensive
Results not generalizable to population
Probability sampling: simple random sampling
Each population element has an equal chance of being chosen
highest generalizability but costly
Systematic sampling
Select random starting point then pick every ith element (e.g every third starting from person 5)
Simplicity (adds a degree of system or process)
Low generalizability if there happens to be a systematic difference between every nth observation
Stratified sampling (probability sampling)
Divide the population in meaningful (homogenous) groups, then apply SRS withing each group
All groups are adequately sampled, allowing for group comparisons
More time consuming and requires homogenous subgroups
Cluster sampling
Divide the population in heterogeneous groups, randomly select a number of groups and selsct each member within these groups
Cluster population –> sample (clusters)
Geographic clusters
Subsets of naturally occuring clusters are typically more homogeneous than heterogeneous
Classification of sampling designs
Sampling of sampling designs
1) probability:
simple random sampling
Systematic sampling
Stratified sampling
Cluster sampling
2) Nonprobability
Convenience sampling
Quota sampling
Judgement sampling
Snowball sampling
Convenience sampling (nonprobability sampling)
Select subjects who are conveniently available
Convenient (inexpensive and fast)
Lower generalizability
Nonprobability sampling (quota sampling)
Fix quota for each subgroup
E.g. do you think dog owners should pay taxes for their pet
Household with dog (mainly no)
household with no dog (mainly yes)
When minority participation is critical (good)
Lower generalizability
Nonprobability sampling: judgement sampling
Select subjects based on t hier knowledge/professional judgement
Convenient (inexpensive and fast) when a limited # of people has the info you need
Lower generalizability