Lecture 16 ARM Flashcards

(49 cards)

1
Q

Independent variable

A

The presumed cause or factor that you think influences something else (the input)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Dependent variable

A

The outcome or effect that you measure (the result influenced by the IV)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Categorical vs Numerical variables

A

Variables come in different types. Some are categorical /labels, others are numbers and these are summarised in difference ways

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Qualitative / categorical variables

A

Can be operationalised using labels

Nominal
Ordinal
Binary variable (off vs on, yes or no)
Cannot calculate the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Quantitative variables

A

Interval (0 has a meaning, not nothing- Eg temperature, 0 degrees means that it is cold) and ratio (0 means absence, lack, nothing -eg measure weight or height)

Can calculate the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Nominal

A

There is no inherent hierarchical rank - there are simply categories - no “ordering”

Eg gender - hair color - where you live

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Ordinal

A

Ranking off categories or possibility to order hiearchically based on ranks

eg ranks in the military, Likert scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Likert scale

A

Providing a numerical answer to a qualitative question regarding enjoyability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Interval variable

A

A measurement that is used to define values measured along a scale, with each point placed at an equal distance from one another

This is just saying measurement based on numbers in a fancy way
Also the number “0” has a meaning, does not indicate absence or zero point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Ratio variable

A

The only difference between the ratio and interval is that the ratio variable already has a zero value, a true zero point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Flow chart measurement scals

A

Used to find out what type of variables are used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Connecting variables to descriptive stats

A

1) Nominal (categorical) –> Mode
2) Ordinal (ranked) – >Median
3) Interval/ Ratio (numeric) –> Mean (but can use all)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Mode

A

Most common category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Median

A

Middle value or category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Mean

A

Average value of all the numbers - added and divided by the number of the values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Two overarching approaches to sampling

A

1)Probability sampling
2) Non-probability sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Probability sampling

A
  • Quantitative
  • Randomly assigned
  • Representative of larger population
  • High external validity

Each individual has a random selection - determined by chance.Several types of this type of sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Non-probability sampling ‘

A

-Qualitative
- Not random, eg snowball sampling
- Not representative
- Low external validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Population (N)

A

Total unit from which the sample is drawn
Eg census

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Census

A

A sample which compromises the entire population

21
Q

Sampling frame

A

The concrete list of units from which the samples are selected.
Eg sample
Used for smaller cases

22
Q

Sample (n)

A

selection of participants from the total population(N)

n= 80 means sample is 80

22
Q

Representative sample

A

A sample wherein the units are represented in the same proportion as in the general population

23
Q

Types of random sampling

A
  1. Simple random
  2. Systematic random
  3. Stratified
  4. Multi-cluster
24
Simple random sampling
Select population - make a sampling frame (list of people who can be selected) - determine sample size - select randomly from pool Equal chance of being selected Depend on the study Like a lottery!
25
Systematic random sampling
Every x-th person in a sample frame is selected
26
Stratified sampling
Divide a population into subgroups, or strata, based on characteristics and randomly choosing from each strata
27
Multi-cluster sampling
Used when the population is large and geographically dispersed (many strata) Eg when population is distributed in provinces - Random selection of provinces - From provinces random sampling villages - From villages random sampling households Eg netherlands
28
Sample size and generalisation
Required sample size must be given in absolute numbers, not a percentage Heuristic: from a sample size higher than 29, aka n=30, it is possible to generalise across a group using standard normal distribution (below 30: use a t-distribution You can never generalise beyond your sampling frame Sampling errors
29
Heuristic
From a sample size higher than 29, aka n=30, it is possible to generalize across a group using standard normal distribution (below 30: use a t-distribution
30
Descriptive statistics
Summarise data to identfy patterns and typical values
31
Central tendency
Definition: A statistics that identifies the center of a distribution of data Purpose: Finds a single value that best represents all observations in the sample Three types; Mean, median, mode - which are all different notions of "average" .. Use in research: In papers to summarise data, often in tables of sample characteristics - give a sense of typical values
32
Variation / Dispersion
Measures of spread (range, IQR, variance, standard deviation)
33
The Mean (arithmetic average)
Mean: Sum of all values divided by the number of values - Uses ALL data points in the calculation Interpretation: The balance point of the data - very common Type of data: Requires interval/ratio (only makes sense for quantitative data) Challenge: Sensitive to outliers - one very high or low value can skew the mean significantly
34
Outlier
A unnaturally high or low value of the data set which can skew the data (or mean) Researchers should comment on the outliers - they can be valuable data as well
35
The Median (Middle Value)
Median: The value that falls in the middle of the SORTED dataset - from smallest to largest - or the 50th percentile. Requires data that can be ordered or rank, thus - ordinal or interval / ratio Benefit: Resistant to outliers - not affected by extremely high or low values - Eg adding a millionaire to the income sample of a household would barely changes the median, but would raise mean a lot
36
The Mode (Most Frequent)
Mode: The most frequently occurring value in the dataset (typetall) Applicable to ANY level of measurement - the only measure that works for nominal data Usage: Highlights the most common category or value, useful for categorical data or to report a typical category
37
Choosing the right measure of center
Match to data level: - Nominal --> Mode ONLY - Ordinal --> Median (mode works) - Interval/Ratio --> Mean usually, but median if distribution is SKEWED (and mode also works) Distribution shape - Symmetric (normal-ish) --> mean = median = mode (all similar - Skewed (long tail - bell curve moves in one direction or the other) --> median is a better representation than the mean (mean gets pulled by tail) Outliers - If outliers are present, median is best - If outliers are not present, mean is more powerful In practice - both are important to report
38
Univariate analysis
Analysing a single variable at a time
39
Table of centrality and dispersion measures by type of scale (variable)
Check table
40
Unimodal distribution
Dataset with only one single peak or mode
41
Bell curve (unimodal distribution)
- Symmetrical distribution - Mode=median=mean - located at the peak - Normal distribution
42
NOT a bell curve
- Not symmetrical - Mode, median and mean occur at distinct points - Skewed data
43
Skewed distribution
Stems from the mean being larger or smaller than the other values - maybe from outliers
44
Right skewed distribution
- When the TAIL is to the right, and the bump is to the left - POSITIVE SKEW (think right = positive!) - Mean is GREATER THAN the median - Mean AND median are GREATER THAN the mode (negative/left side of the x-axis)
45
Left-skewed distribution
- Tail is going to the left, bump to the right - NEGATIVELY SKEWED  - Mean is LESS than median (think left = less) - Mean AND median are LESS THAN mode (positive side or right side of x-axis
46
Centrality - limits
Do not take into account outliers THEREFORE we look at dispersion! (Frequencies, range, IQR, Variance, Standard deviation)
47
Frequency
Type of dispersion measure Definition: Number of occurrences of each value in the data set  Finding most common and least common categories in a data set
47
Range
Measure of dispersion Difference between the maximum and minimum value of a data set Maximum value - minimum value = range Finding the total spread of the data High range - large variability Small range - small variability