1.1.3 Flashcards

(24 cards)

1
Q

How do we find the mode of a numeric variable?

A

dataframe %>%
count(variable) %>%
arrange(desc(n))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can we use the median function?

A

mediandataframe$(variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can we use the mean function?

A

mean(dataframe$variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can we make summary values for different variables?

A

dataframe %>%
summary(
summary_value = sum(variable1)
summary_value2 = mean(variable2)
)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are two ways we can find the interquartile range of numerical data?

A

IQR(dataframe$variable)
OR
dataframe %>%
summarise(
iqr_variable = IQR(variable)
)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are deviations?

A

Distances from each value to the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the formula for deviation?

A

∑_(i=1)^n▒(x_i-x ̅ ) =0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the sum of deviations from the mean always equal to?

A

Zero because the sum of the positive deviations equals the sum of the negative deviations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why do we consider squared deviations?

A

Because they all become positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is variance and what is it denoted by?

A

s squared
The average of the squared deviations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can we calculate variance in R for numerical data?

A

var(dataframe$variable)
OR
dataframe %>%
summarise(
variance_variable = var(variable)
)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is standard deviation?

A

The square root of the variance
Denoted by s
Rough estimate of the typical distance from a value to the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can we get R to calculate the SD of a variable?

A

dataframe %>%
summarise(
sd_variable = sd(variable)
)
OR
sd(dataframe$variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are boxplots useful for visualising?

A

IQR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can we create a boxplot for a numerical variable?

A

ggplot(data = dataframe, aes(x = variable)) +
geom_boxplot()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does a histogram allow us to do and what does it show?

A

Visualise numeric data and show frequency of values which fall within bins of equal width

17
Q

How can we create a histogram for a numerical variable in R?

A

ggplot(data = dataframe, aes(x = variable)) +
geom_histogram(binwidth = x)

18
Q

How are the values on the y-axis scaled in a density curve?

A

The total area under the curve is equal to 1

19
Q

What is a density curve?

A

A curve reflecting the distribution of a variable
The area under the curve sums to 1

20
Q

How can we make a density curve in R for a numeric variable?

A

ggplot(data = dataframe, aes(x = variable)) +
geom_density() +
xlim(min, max)

21
Q

What is skewness?

A

A measure of asymmetry in a distribution

22
Q

How can we add a vertical line to a ggplot?

23
Q

What does the filter() function allow us to do, and an example of how to use it?

A

Filter a dataframe down to rows which meet a given function. It will return all columns.
data %>%
filter(variable1 == value1)

24
Q

What does the select() function allow us to do, and an example of how to use it?

A

Choose certain columns in a dataframe. It will return all rows.
data %>%
select(variable1, variable2)