1.1.4 Flashcards
(16 cards)
What are explanatory variables?
Explanatory variable = independent variable = predictor variable = X
What are outcome variables?
Outcome variable = dependent variable = response variable = X
What is the difference between the manipulation of variables in experimental and observational studies?
Experimental = explanatory variable is manipulated before response variable is measured
Observational = variables are observed as they naturally exist (not controlled)
What does the group_by() function do?
Creates a grouping in the dataframe
Subsequent functions will be computed on each group
Why do we combine the group_by() and summarise() functions?
To reduce a variable into a summary value for each group in a grouping variable
How can we input group_by() and summarise() into R?
data %>%
group_by(grouping variable) %>%
summarise(
summary_value = …
)
How do we map something on a plot e.g. the colour, to something in the data e.g. a variable?
ggplot(data = dataframe, aes(x = variable 1, col = group in variable)) +
geom_density()
How can we split one plot up using facet_wrap to create separate graphs for each set of values/groups in a variable?
ggplot(data = dataframe, aes(x = variable)) +
geom_histogram() +
face_wrap(~variable/group)
What is the most easily interpreted visualisation of the relationship between two numeric variables?
Scatterplot
What is covariance, and what can it express?
A measure of how two numeric variables vary together
Can express the directional relationship
What are two ways of using the cov() function to calculate covariance?
Use $ to pull out variables from the datset e.g.
cov(dataframe$variableX, dataframe$variableY)
or
Specify dataframe and use %>% + call cov() inside
data %>%
summarise(
mean_variableX = mean(variableX)
mean_variableY = mean(vaariableY),
cov_variableXY = cov(variableX, variableY)
)
How can we summarise one categorical variable using the table function?
table(dataframe$categorical variable)
or
data %>%
count(categorical variable)
What is a two-way table?
A table with each variable on either dimension
How can we create a two-way table for two categorical variables in R?
table(dataframe$variable1, dataframe$variable2)
How can we create a proportion table for two categorial variables in R?
dataframe %>%
select(variableX, variableY) %>%
table() %>%
prop.table() for total proportions
prop.table(margin = 1) for proportions of each row
prop.table(margin = 2) for proportions of each column
How can we make a mosaic plot to visualise a contingency table of two categorical variables in R?
dataframe %>%
select(variable X, variable Y) %>%
table() %>%
or prop.table(margin = 1) %>%
or prop.table(margin = 2) %>%
plot()