r_dplyr Flashcards

Question 1

Q

What is dplyr and what functions does it include?

Answer

A

a grammar of data manipulation, providing a consistent set of verbs
Includes:
- mutate()
- select()
- filter()
- summarise()
- arrange()

Question 2

Q

Store within the variable ‘s’:

mean(height) for “Male”
sd(height) for “Male”

Answer

A

s <- heights %>%

filter(sex == “Male”) %>%

summarize( average = mean(height), standard_deviation = (height) )

s

Question 3

Q

Given the following, print out the “average” and “standard_deviation”:

s <- heights %>%

filter(sex == “Male”) %>%

summarize( average = mean(height), standard_deviation = (height) )

Answer

A

s$average
s$standard_deviation

Question 4

Q

For the murders dataset, calculate the us_murder_rate ( total divided by population)

Answer

A

us_murder_rate <- murders %>% +

summarize(rate = sum(total) / sum(population) * 100000 )

Question 5

Q

Calculate the mean and sd of “height” within the heights dataset, grouping by “sex”

Answer

A

heights %>% +

group_by(sex) %>% +

summarize(average = mean(height), standard_deviation = sd(height) )

Question 6

Q

Calculate the median_rate for the “murder_rate” variable within the murders dataset, grouped by “region”

Answer

A

murders %>% +

group_by(region) %>% +

summarize(median_rate = median(murder_rate) )

Question 7

Q

Order the “murders” dataset by “population, and then print the first five rows

Answer

A

murders %>% arrange(population) %>% head()

Question 8

Q

Order the “murders” dataset by “murder_rate” in descending order, and then print the first five rows

Answer

A

murders %>% arrange(desc( murder_rate) ) %>% head()

Question 9

Q

Order the “murders” dataset by “region” first and then by “murder_rate” second, and then print the first five rows

Answer

A

murders %>% arrange(region, murder_rate) %>% head()

Question 10

Q

Show the top 10 murder rates within the “murders” dataset (not ordered)

Answer

A

murders %>% top_n(10, murder_rate)

Question 11

Q

Show the top 10 murder rates from the “murders” dataset in descending order

Answer

A

murders %>% arrange(desc (murder_rate) ) %>% top_n(10)

Question 12

Q

Filter the NHANES dataset so that:

assign this new data frame to the object “tab”
only “ 20-29” year old “females” are included
save the average and standard deviation of systolic blood pressure (BPSysAve) as average and standard_deviation
return the average as a numeric value (not an object)

Answer

A

tab <- NHANES %>%

filter(AgeDecade == “ 20-29”, Gender == “female”) %>%

summarize(average = mean(BPSysAve, na.rm = TRUE),

standard_deviation = sd(BPSysAve, na.rm = TRUE)) %>%

.$average

Question 13

Q

Filter the NHANES dataset so that:

assign this new data frame to the object “tab”
only “ 20-29” year old “females” are included
save the min and max of systolic blood pressure (BPSysAve) as “min” and “max”

Answer

A

NHANES %>%
filter(AgeDecade == “ 20-29” & Gender == “female”) %>%
summarize(min = min(BPSysAve, na.rm=TRUE), max = max(BPSysAve, na.rm=TRUE))

Question 14

Q

Filter the NHANES dataset so that:

Use the functions filter, group_by, summarize, and the pipe %>% to compute the average and standard deviation of systolic blood pressure for females for each age group separately.
Within summarize, save the average and standard deviation of systolic blood pressure (BPSysAve) as average and standard_deviation.

Answer

A

NHANES %>%
filter(Gender == “female”) %>%
group_by(AgeDecade) %>%
summarize(average = mean(BPSysAve, na.rm = TRUE), standard_deviation = sd(BPSysAve, na.rm = TRUE))

Question 15

Q

Filter the NHANES dataset so that:

Compute the average and standard deviation for each value of Race1 for males in the age decade 40-49.
Order the resulting table from lowest to highest average systolic blood pressure.
Use the functions filter, group_by, summarize, arrange, and the pipe %>% to do this in one line of code.
Within summarize, save the average and standard deviation of systolic blood pressure as average and standard_deviation.

Answer

A

NHANES %>%
filter(Gender == “male”, AgeDecade == “ 40-49”) %>%
group_by(Race1) %>%
summarize(average = mean(BPSysAve, na.rm = TRUE), standard_deviation = sd(BPSysAve, na.rm = TRUE)) %>%
arrange(average)

r_dplyr Flashcards

(15 cards)