r_data_wrangling Flashcards

1
Q

What library is used for data wrangling in R?

A

dplyr

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Create a new variable “rate” that stores the murder rate within the dataset “murders”

A
  • murders <- mutate(murders, rate=total/population*100000)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Display only the states where the murder rate (“rate”) is less than .71

A
  • filter(murders,rate <= 0.71)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Create a new dataset named “new_table” from the original “murders” dataset, that consists of only the following variables:

  • state
  • region
  • rate
A
  • new_table <- select(murders, state, region, rate)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Using “piping”, do the following:

  1. Select the following variables:
    1. state
    2. region
    3. rate
  2. display only those states with a rate less than or equal to .71
A
  • murders %>% select(state,region,rate) %>% filter(rate <= 0.71)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What line of code must be added at the end of a data frame creation function (data.frame) in order to make the strings represented as characters

A
  • stringsAsFactors = FALSE
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Use the function mutate to add a murders column named rate with the per 100,000 murder rate.

A
  • murders <- mutate(murders, rate=total/population*100000)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Redefine murders to include a column named rank with the ranks of rate from highest to lowest

A
  • # Redefine murders to include a column named rank with the ranks of rate from highest to lowest
  • murders <- mutate(murders, rank = rank(-rate))
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Use select to only show state names and abbreviations from murders

A
  • # Use “select” to only show state names and abbreviations from murders
  • select(murders,state,abb)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Filter to show the top 5 states with the highest murder rates

A
  • filter(murders, rank <= 5)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Create a new data frame called no_south that removes states from the South region.

How many states are in this category?

A
  1. # Use filter to create a new data frame no_south
    • no_south <- filter(murders, region != “South”)
  2. # Use nrow() to calculate the number of rows
    • nrow(no_south)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Get unique values within the “region” variable of the “murders” dataset

A
  • unique(murders$region)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  1. Create a new data frame called murders_nw with only the states from the Northeast and the West.
  2. How many states are in this category?
A
  1. # Create a new data frame called murders_nw with only the states from the northeast and the west
    • murders_nw <- filter(murders, region %in% c(“Northeast”, “West”))
  2. # Number of states (rows) in this category
    • nrow(murders_nw)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
  1. Add a murder rate column and a rank column as done before
  2. Create a table, call it “my_states”, that satisfies both the conditions: it is in the Northeast or West and the murder rate is less than 1.
  3. Use select on “my_states” to show only the state name, the rate and the rank
A
  1. # add the rate column
    • murders <- mutate(murders, rate = total / population * 100000, rank = rank(-rate))
  2. # Create a table, call it my_states, that satisfies both the conditions
    • my_states <- filter(murders, region %in% c(“Northeast”,”West”) & rate < 1)
  3. # Use select to show only the state name, the murder rate and the rank
    • select(my_states, state, rate, rank)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In one line:

  1. Filter the “murders” dataste to show:
    1. states in the Northeast or West
    2. and the murder rate is less than 1.
  2. Select only the state name, the rate and the rank
A
  1. # show the result and only include the state, rate, and rank columns, all in one line
    • filter(murders, region %in% c(“Northeast”,”West”) & rate < 1) %>% select(state, rate, rank)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Use just one line to create a new data frame, called, my_states that has murder rate and rank column, consider only states in the Northeast or West, which have a murder rate lower than 1 and contain only the state, rate, and rank columns. The line should have four components separated by three %>%.

  1. The original dataset murders
  2. A call to mutate to add the murder rate and the rank.
  3. A call to filter to keep only the states from the Northeast or West and that have a murder rate below 1
  4. A call to select that keeps only the columns with the stata name, the murder rate and the rank.
A
  1. # Create new data frame called my_states (with specifications in the instructions)

my_states <- murders %>%

mutate(rate = total / population * 100000, rank = rank(-rate)) %>%

filter(region %in% c(“Northeast”, “West”) & rate < 1) %>%

select(state, rate, rank)