r_data_wrangling Flashcards

Question 1

Q

What library is used for data wrangling in R?

Question 2

Q

Create a new variable “rate” that stores the murder rate within the dataset “murders”

Answer

A

murders <- mutate(murders, rate=total/population*100000)

Question 3

Q

Display only the states where the murder rate (“rate”) is less than .71

Answer

A

filter(murders,rate <= 0.71)

Question 4

Q

Create a new dataset named “new_table” from the original “murders” dataset, that consists of only the following variables:

state
region
rate

Answer

A

new_table <- select(murders, state, region, rate)

Question 5

Q

Using “piping”, do the following:

Select the following variables:
1. state
2. region
3. rate
display only those states with a rate less than or equal to .71

Answer

A

murders %>% select(state,region,rate) %>% filter(rate <= 0.71)

Question 6

Q

What line of code must be added at the end of a data frame creation function (data.frame) in order to make the strings represented as characters

Answer

A

stringsAsFactors = FALSE

Question 7

Q

Use the function mutate to add a murders column named rate with the per 100,000 murder rate.

Answer

A

murders <- mutate(murders, rate=total/population*100000)

Question 8

Q

Redefine murders to include a column named rank with the ranks of rate from highest to lowest

Answer

A

# Redefine murders to include a column named rank with the ranks of rate from highest to lowest
murders <- mutate(murders, rank = rank(-rate))

Question 9

Q

Use select to only show state names and abbreviations from murders

Answer

A

# Use “select” to only show state names and abbreviations from murders
select(murders,state,abb)

Question 10

Q

Filter to show the top 5 states with the highest murder rates

Answer

A

filter(murders, rank <= 5)

Question 11

Q

Create a new data frame called no_south that removes states from the South region.

How many states are in this category?

Answer

A

# Use filter to create a new data frame no_south
- no_south <- filter(murders, region != “South”)
# Use nrow() to calculate the number of rows
- nrow(no_south)

Question 12

Q

Get unique values within the “region” variable of the “murders” dataset

Answer

A

unique(murders$region)

Question 13

Q

Create a new data frame called murders_nw with only the states from the Northeast and the West.
How many states are in this category?

Answer

A

# Create a new data frame called murders_nw with only the states from the northeast and the west
- murders_nw <- filter(murders, region %in% c(“Northeast”, “West”))
# Number of states (rows) in this category
- nrow(murders_nw)

Question 14

Q

Add a murder rate column and a rank column as done before
Create a table, call it “my_states”, that satisfies both the conditions: it is in the Northeast or West and the murder rate is less than 1.
Use select on “my_states” to show only the state name, the rate and the rank

Answer

A

# add the rate column
- murders <- mutate(murders, rate = total / population * 100000, rank = rank(-rate))
# Create a table, call it my_states, that satisfies both the conditions
- my_states <- filter(murders, region %in% c(“Northeast”,”West”) & rate < 1)
# Use select to show only the state name, the murder rate and the rank
- select(my_states, state, rate, rank)

Question 15

Q

In one line:

Filter the “murders” dataste to show:
1. states in the Northeast or West
2. and the murder rate is less than 1.
Select only the state name, the rate and the rank

Answer

A

# show the result and only include the state, rate, and rank columns, all in one line
- filter(murders, region %in% c(“Northeast”,”West”) & rate < 1) %>% select(state, rate, rank)

Question 16

Q

Use just one line to create a new data frame, called, my_states that has murder rate and rank column, consider only states in the Northeast or West, which have a murder rate lower than 1 and contain only the state, rate, and rank columns. The line should have four components separated by three %>%.

The original dataset murders
A call to mutate to add the murder rate and the rank.
A call to filter to keep only the states from the Northeast or West and that have a murder rate below 1
A call to select that keeps only the columns with the stata name, the murder rate and the rank.

Answer

Study These Flashcards

A

# Create new data frame called my_states (with specifications in the instructions)

my_states <- murders %>%

mutate(rate = total / population * 100000, rank = rank(-rate)) %>%

filter(region %in% c(“Northeast”, “West”) & rate < 1) %>%

select(state, rate, rank)

r_data_wrangling Flashcards

(16 cards)