Topic 4 Flashcards
(14 cards)
Exploratory Data Analysis (EDA)
Visually and statistically summarising, exploring, and understanding the main characteristics of a dataset.
Categorical Variables
Variables that aren’t numerical and fall into a category (eg. gender)
Ordinal Variables
Categorical variables that are also ordered (eg. bedtime)
Contingency Table
Summarises data for two categorical variables
Modality
Number of peaks in a distribution
Skewness
Indicates whether the data in a distribution leans to the left or right (left skew is actually when there is a peak on the right)
What to do with extremely skewed data
Transformations (eg. log transformation)
Associated/ Dependent Variables
Variables that show some connection
Statistical Inference
Reaching conclusions about the population based on an analysis of a sample
Inference
Drawing a conclusion on the full population from a sample
Stratified Sampling
Take a random sample from each distinct subgroup, which is made up of distinct observations that are similar in some way
Cluster Sampling
Take a random sample of clusters and sample all observations in that cluster
Variance in Sample Stats
s^2 = ∑ n, i=1 (xi-x~)^2 / n-1 where x~ is the mean