1.1.2 Flashcards
(20 cards)
What does the %>% symbol do?
Takes the output on the LHS and inputs it on the RHS
What is an example of the %>% symbol in use?
library(tidyverse)
starwars2 %>%
summary()
Give 2 examples to show the difference between inside-out and left-to-right code
Inside-out
summary(as.factor(starwars2$homeworld))
Left-to-right
starwars2$homeworld %>%
as.factor() %>%
summary()
What does summary() provide for continuous variables?
Numeric descriptions of the distribution of values in each variables
What does the distribution of a variable show?
The frequency of different values
What is a frequency distribution?
An overview of all values in some variable and how many times they occur
How can we access the frequencies of different response levels of a variable in a dataframe?
dataframe %>%
count(variable name)
What is the mode and what type of data is it used to measure central tendency for?
The most frequent value
Unordered categorical data
What is relative frequency distribution, what does it show, and how can it be written?
Percentage of respondents in each category
Proportion of times each value occurs
Decimals, fractions, percents
What are two ways we can calculate the relative frequency distributions in a dataframe?
Make a frequency table
freq_table$n/sum(freq_table$n)
OR
freq_table <-
dataframe %>%
count(variable name) %>%
mutate(
prop = n/sum(n)
)
How can we plot values in a bar chart?
ggplot(data = dataframe, aes(x = entry, y = entry)) +
geom_col()
How can we change the axis labels?
labs(title = “entry”, x = “entry”, y = “entry”)
How can we make a scatterplot?
ggplot(data = dataframe)
geom_point
How can we change the limits of the axis?
ylim(min, max)
How can we remove the legend?
theme(legend.position = “none”)
How can we add the percentage and cumulative percentage of each response to a table?
dataframe %>%
count(entry) %>%
mutate(
percent = n/sum(n)*100
cumulative_percent = cumsum(percent)
)
What is the median?
The middle value
What does count() do?
Counts the number of occurrences of each unique value in a variable
What does mutate() do?
Adds new variables/modifies existing variables in a dataframe
What do min() and max() do?
Return the minimum/maximum value of a variable