1.1.3 Flashcards
(24 cards)
How do we find the mode of a numeric variable?
dataframe %>%
count(variable) %>%
arrange(desc(n))
How can we use the median function?
mediandataframe$(variable)
How can we use the mean function?
mean(dataframe$variable)
How can we make summary values for different variables?
dataframe %>%
summary(
summary_value = sum(variable1)
summary_value2 = mean(variable2)
)
What are two ways we can find the interquartile range of numerical data?
IQR(dataframe$variable)
OR
dataframe %>%
summarise(
iqr_variable = IQR(variable)
)
What are deviations?
Distances from each value to the mean
What is the formula for deviation?
∑_(i=1)^n▒(x_i-x ̅ ) =0
What does the sum of deviations from the mean always equal to?
Zero because the sum of the positive deviations equals the sum of the negative deviations
Why do we consider squared deviations?
Because they all become positive
What is variance and what is it denoted by?
s squared
The average of the squared deviations
How can we calculate variance in R for numerical data?
var(dataframe$variable)
OR
dataframe %>%
summarise(
variance_variable = var(variable)
)
What is standard deviation?
The square root of the variance
Denoted by s
Rough estimate of the typical distance from a value to the mean
How can we get R to calculate the SD of a variable?
dataframe %>%
summarise(
sd_variable = sd(variable)
)
OR
sd(dataframe$variable)
What are boxplots useful for visualising?
IQR
How can we create a boxplot for a numerical variable?
ggplot(data = dataframe, aes(x = variable)) +
geom_boxplot()
What does a histogram allow us to do and what does it show?
Visualise numeric data and show frequency of values which fall within bins of equal width
How can we create a histogram for a numerical variable in R?
ggplot(data = dataframe, aes(x = variable)) +
geom_histogram(binwidth = x)
How are the values on the y-axis scaled in a density curve?
The total area under the curve is equal to 1
What is a density curve?
A curve reflecting the distribution of a variable
The area under the curve sums to 1
How can we make a density curve in R for a numeric variable?
ggplot(data = dataframe, aes(x = variable)) +
geom_density() +
xlim(min, max)
What is skewness?
A measure of asymmetry in a distribution
How can we add a vertical line to a ggplot?
geom_vline()
What does the filter() function allow us to do, and an example of how to use it?
Filter a dataframe down to rows which meet a given function. It will return all columns.
data %>%
filter(variable1 == value1)
What does the select() function allow us to do, and an example of how to use it?
Choose certain columns in a dataframe. It will return all rows.
data %>%
select(variable1, variable2)