Lecture 5 Flashcards

(41 cards)

1
Q

In data sets rows capture

A

Obersvations (on e.g. consumers or firms)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Columns display

A

Variables. A variable can take on different values for different subjects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Dummy variables

A

Variables that only take on the values 0 or 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

codebook

A

A list of all the codes used in a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The variables in your data set need to match the unit of analysis in a study. Specifically:

A

The dependent variable is measured at the level of the unit of analysis. So are mediator vairables

Independent and moderator vairables are measured at the level of the unit of analysis or at a more aggregate level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Population

A

Entire group of people, firms, events, or things of interest for which you would like to make inferences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sample

A

A subset of the population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why use samples in the first place

A

Impossible to study the entire population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The sampling process consists of the following steps

A

1) define the population you are interested in
2) Determine the sampling frame. The sampling frame is the physical representation of the pupulation through which one can reach out to that population
3) Decide on the sampling design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to define the target population and choose the sampling frame

A

1) Define the target population: (Students at tisem, employees at philips etc.)

2) Determine the sampling frame

-Physical representation of the target population (Examples: students at Tisem –> Database students TiSEM)

3) Determine the sampling design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Coverage error

A

Sampling frame =/ population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Under coverage

A

Ture population members are excluded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Miss-coverage

A

Non population members are included

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Solution to coverage error

A

If small, recognize but ignore
If large, redefine the population in terms of the sampling frame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Probability sampling

A

Each element of the population has a known chance of being selected as a subject

Results generalizable to population

More time and resource intensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Nonprobability sampling

A

The elements of the population do not have a known chance of being selected as a subject

Less time and resource intensive

Results not generalizable to population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Probability sampling: simple random sampling

A

Each population element has an equal chance of being chosen

highest generalizability but costly

18
Q

Systematic sampling

A

Select random starting point then pick every ith element (e.g every third starting from person 5)

Simplicity (adds a degree of system or process)

Low generalizability if there happens to be a systematic difference between every nth observation

19
Q

Stratified sampling (probability sampling)

A

Divide the population in meaningful (homogenous) groups, then apply SRS withing each group

All groups are adequately sampled, allowing for group comparisons

More time consuming and requires homogenous subgroups

20
Q

Cluster sampling

A

Divide the population in heterogeneous groups, randomly select a number of groups and selsct each member within these groups

Cluster population –> sample (clusters)

Geographic clusters

Subsets of naturally occuring clusters are typically more homogeneous than heterogeneous

21
Q

Classification of sampling designs

A

Sampling of sampling designs

1) probability:
simple random sampling
Systematic sampling
Stratified sampling
Cluster sampling

2) Nonprobability
Convenience sampling
Quota sampling
Judgement sampling
Snowball sampling

22
Q

Convenience sampling (nonprobability sampling)

A

Select subjects who are conveniently available

Convenient (inexpensive and fast)

Lower generalizability

23
Q

Nonprobability sampling (quota sampling)

A

Fix quota for each subgroup

E.g. do you think dog owners should pay taxes for their pet

Household with dog (mainly no)
household with no dog (mainly yes)

When minority participation is critical (good)
Lower generalizability

24
Q

Nonprobability sampling: judgement sampling

A

Select subjects based on t hier knowledge/professional judgement

Convenient (inexpensive and fast) when a limited # of people has the info you need

Lower generalizability

25
Nonprobability sampling (snowball sampling)
Do you know people who... Good for rare characteristics (experts) First participants strongly influence the sample
26
Measurement or operationalization means
Turning abstract conceptual variables into measurable observations
27
Nominal scales
A scale that allows you to classify your data into categories E.g. states in the united states that are either democrat or repubican You assign 1 to democrat and 2 to republican
28
Ordinal scale
Ranked or ordered Rank orders he categories in a meaningful way More information than a nominal scale; here three is more than 2 E.g. Best to worst, first to last etc.
29
Interval scale
Allows you to compare differences between values Meaningful differences between values, but no natural zero point E.g. IQ Compared to ranked order; 1 --> 2 is not the same as 2 --> 3 when ranking chili peppers. Iq is standardized and comparable
30
Ratio scales
Meaningful differences and ratios between values due to a natural zero point Ratios are meaningful for this scale E.g. Distance Zero point is possible
31
Measures of central tendency
Mean (average), median(central variable in an ordered group of variables) or mode( most common variable)
32
Measures of dispersion
Range, standard deviation, variance or interquartile range
33
Indiffeential statistics
Methods to draw conclusions (or to make inferences) E.g. Mean difference tests
34
Choosing between descriptive statistics
Nominal scale Measure of central tendency: mode Measure of dispersion --- Ordinal scale Measure of central tendency: median Measure of dispersion (interquartile range) Interval scale Measure of central tendency: mean Measure of dispersion (standard deviation, variance) Ratio Measure of central tendency: mean Measure of dispersion (standard deviation, variance)
35
Choosing between inferential statistics:
Check slides
36
When there are multiple IVs in a study, with different measurement scales:
The highest scale determines the statistical technique
37
Choosing inferential statistics: T-test or ANOVA
T-test: compaares two means (two levels of an IV) Anova: can compare more than two levels Choice, as such: Depends on the number of IVs Depends on the number of levels (conditions or groups) of the IV
38
Choosing inferential statistics: rating scales (Likert scale)
strongly disagree, disagree, undecided, agree and strongly agree
39
Choosing inferential statistics: rating scales (semantic differential)
Organized _ _ _ _ _ _ _ Unorganized Cold _ _ _ _ _ _ _ _ Warm Modern _ _ _ _ _ _ _ _ old fashioned Treated as interval scales
40
From a statistical point of view, a moderator is
Also considered an IV.
41
To test the moderating effect of M on the relationship between X and Y you have to include three IVs in your regression model:
The main effect of X The interaction effect between X and M (=X*M) to capture the moderating effect of M on the relationship between X and Y The main effect of M (to statistically control for the impact of M on Y; if you would not include M, the effect of X*M would not be correctly estimated)