r_dplyr Flashcards

1
Q

What is dplyr and what functions does it include?

A
  • a grammar of data manipulation, providing a consistent set of verbs
  • Includes:
    • mutate()
    • select()
    • filter()
    • summarise()
    • arrange()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Store within the variable ‘s’:

  • mean(height) for “Male”
  • sd(height) for “Male”
A

s <- heights %>%

filter(sex == “Male”) %>%

summarize( average = mean(height), standard_deviation = (height) )

s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Given the following, print out the “average” and “standard_deviation”:

s <- heights %>%

filter(sex == “Male”) %>%

summarize( average = mean(height), standard_deviation = (height) )

A
  • s$average
  • s$standard_deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

For the murders dataset, calculate the us_murder_rate ( total divided by population)

A

us_murder_rate <- murders %>% +

summarize(rate = sum(total) / sum(population) * 100000 )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Calculate the mean and sd of “height” within the heights dataset, grouping by “sex”

A

heights %>% +

group_by(sex) %>% +

summarize(average = mean(height), standard_deviation = sd(height) )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Calculate the median_rate for the “murder_rate” variable within the murders dataset, grouped by “region”

A

murders %>% +

group_by(region) %>% +

summarize(median_rate = median(murder_rate) )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Order the “murders” dataset by “population, and then print the first five rows

A

murders %>% arrange(population) %>% head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Order the “murders” dataset by “murder_rate” in descending order, and then print the first five rows

A

murders %>% arrange(desc( murder_rate) ) %>% head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Order the “murders” dataset by “region” first and then by “murder_rate” second, and then print the first five rows

A

murders %>% arrange(region, murder_rate) %>% head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Show the top 10 murder rates within the “murders” dataset (not ordered)

A

murders %>% top_n(10, murder_rate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Show the top 10 murder rates from the “murders” dataset in descending order

A

murders %>% arrange(desc (murder_rate) ) %>% top_n(10)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Filter the NHANES dataset so that:

  • assign this new data frame to the object “tab”
  • only “ 20-29” year old “females” are included
  • save the average and standard deviation of systolic blood pressure (BPSysAve) as average and standard_deviation
  • return the average as a numeric value (not an object)
A

tab <- NHANES %>%

filter(AgeDecade == “ 20-29”, Gender == “female”) %>%

summarize(average = mean(BPSysAve, na.rm = TRUE),

standard_deviation = sd(BPSysAve, na.rm = TRUE)) %>%

.$average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Filter the NHANES dataset so that:

  • assign this new data frame to the object “tab”
  • only “ 20-29” year old “females” are included
  • save the min and max of systolic blood pressure (BPSysAve) as “min” and “max”
A

NHANES %>%
filter(AgeDecade == “ 20-29” & Gender == “female”) %>%
summarize(min = min(BPSysAve, na.rm=TRUE), max = max(BPSysAve, na.rm=TRUE))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Filter the NHANES dataset so that:

  • Use the functions filter, group_by, summarize, and the pipe %>% to compute the average and standard deviation of systolic blood pressure for females for each age group separately.
  • Within summarize, save the average and standard deviation of systolic blood pressure (BPSysAve) as average and standard_deviation.
A

NHANES %>%
filter(Gender == “female”) %>%
group_by(AgeDecade) %>%
summarize(average = mean(BPSysAve, na.rm = TRUE), standard_deviation = sd(BPSysAve, na.rm = TRUE))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Filter the NHANES dataset so that:

  • Compute the average and standard deviation for each value of Race1 for males in the age decade 40-49.
  • Order the resulting table from lowest to highest average systolic blood pressure.
  • Use the functions filter, group_by, summarize, arrange, and the pipe %>% to do this in one line of code.
  • Within summarize, save the average and standard deviation of systolic blood pressure as average and standard_deviation.
A

NHANES %>%
filter(Gender == “male”, AgeDecade == “ 40-49”) %>%
group_by(Race1) %>%
summarize(average = mean(BPSysAve, na.rm = TRUE), standard_deviation = sd(BPSysAve, na.rm = TRUE)) %>%
arrange(average)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly