Lecture 5 - Data Analysis Methods Flashcards
(29 cards)
Descriptive statistics
Explaining the basic features of the data
Inferential statistics
Aims to draw conclusions (inferences) from the data that can be generalised to a larger population
Nominal
Data with no inherent order or ranking
E.g.: Nationality, gender, religion
Ordinal
Data with an inherent rank or order
E.g.: Level of education
Scale / Continuous
Ordered data with a meaningful metric
E.g.: GDP, height, age
Mean
Average
(Used for ordinal and scale)
Median
Middle point
(Used for ordinal and scale)
Mode
Most common / frequent
(Used for nominal, ordinal and scale)
Range
Highest value minus lowest value
Interquartile Range (IQR)
Q3 minus Q1
Standard deviation
Average spread of the data around the mean. The larger the standard deviation, the more spread out the data is.
(Used for ordinal and scale)
Outlier
A data point that differs significantly from other observations
Descriptive statistics / Univariate analysis
Allow us to summarise and display information about single variables (information such as the N, mean, median, standard deviation and range).
Bivariate Analysis
Analysis of two variables to determine their relationship
Correlation coefficient
Measures the degree of linearity in the relationship between the variables.
Correlation coefficient (in Social Science) is between -1 and +1
- (Very) weak correlation: between -0.2 & 0.2
- Medium correlation: Between -0.2 & -0.4 or 0.2 & 0.4
- (Very) strong correlation: < -0.4 or > 0.4
Statistical significance
Tells us if a statistically significant relationship exists (i.e. the relationship is not based on chance)
P-value
Used to determine statistical significance. The p-value is a number between 0 and 1.
p≤0.05 = there is a statistically significant relationship. ⟶ Reject null hypothesis
p>0.05 = there is no statistically significant relationship.
Multivariate analysis
Allows the simultaneous investigation of the relationship between more than two variables
Name the 4 types of variables
Types of variables:
- Independent variables
- Dependent variables
- Control variables
- Confounding variables
Independent variables
A variable in the analysis of the relationship that assumes to influence another variable.
Dependent variables
A variable in the analysis of the relationship which is assumed to be influenced by one or more variables.
Control variables
A control variable is anything that is held constant or limited in a research study. It’s a variable that is not of direct interest to the study’s objectives (but may have some impact).
- Potential control variables examples: population size, level of corruption, natural resource wealth, etc.
Confounding variable
A variable that influences both the independent variable and dependent variable, causing a spurious association.
- Example of spurious association: In the summer more people eat ice cream, and more people drown. That does not mean that eating ice cream causes people to drown. People eat ice cream and swim, because the weather is nice. When the weather is nice, more people swim, which also means more people are likely to drown.
- In a spurious correlation, two events are found to be (cor)related despite having no logical connection
Hypothesis, Null hypothesis and Alternative hypothesis
- Hypothesis: A Statement about a social phenomenon that can be tested empirically.
It describes in concrete terms what you expect will happen in your study.
1. Null hypothesis (H0): (While controlling for population size & GDP) there is no statistically significant relationship between intensity of conflict and the level of forced migration. [i.e. p>0.05]
2. Alternative hypothesis (H1): (While controlling for population size & GDP) there is a statistically significant relationship between intensity of conflict and the level of forced migration. [i.e. p≤0.05]