r_data_wrangling Flashcards
What library is used for data wrangling in R?
dplyr
Create a new variable “rate” that stores the murder rate within the dataset “murders”
- murders <- mutate(murders, rate=total/population*100000)
Display only the states where the murder rate (“rate”) is less than .71
- filter(murders,rate <= 0.71)
Create a new dataset named “new_table” from the original “murders” dataset, that consists of only the following variables:
- state
- region
- rate
- new_table <- select(murders, state, region, rate)
Using “piping”, do the following:
- Select the following variables:
- state
- region
- rate
- display only those states with a rate less than or equal to .71
- murders %>% select(state,region,rate) %>% filter(rate <= 0.71)
What line of code must be added at the end of a data frame creation function (data.frame) in order to make the strings represented as characters
- stringsAsFactors = FALSE
Use the function mutate to add a murders column named rate with the per 100,000 murder rate.
- murders <- mutate(murders, rate=total/population*100000)
Redefine murders to include a column named rank with the ranks of rate from highest to lowest
- # Redefine murders to include a column named rank with the ranks of rate from highest to lowest
- murders <- mutate(murders, rank = rank(-rate))
Use select to only show state names and abbreviations from murders
- # Use “select” to only show state names and abbreviations from murders
- select(murders,state,abb)
Filter to show the top 5 states with the highest murder rates
- filter(murders, rank <= 5)
Create a new data frame called no_south that removes states from the South region.
How many states are in this category?
- # Use filter to create a new data frame no_south
- no_south <- filter(murders, region != “South”)
- # Use nrow() to calculate the number of rows
- nrow(no_south)
Get unique values within the “region” variable of the “murders” dataset
- unique(murders$region)
- Create a new data frame called murders_nw with only the states from the Northeast and the West.
- How many states are in this category?
- # Create a new data frame called murders_nw with only the states from the northeast and the west
- murders_nw <- filter(murders, region %in% c(“Northeast”, “West”))
- # Number of states (rows) in this category
- nrow(murders_nw)
- Add a murder rate column and a rank column as done before
- Create a table, call it “my_states”, that satisfies both the conditions: it is in the Northeast or West and the murder rate is less than 1.
- Use select on “my_states” to show only the state name, the rate and the rank
- # add the rate column
- murders <- mutate(murders, rate = total / population * 100000, rank = rank(-rate))
- # Create a table, call it my_states, that satisfies both the conditions
- my_states <- filter(murders, region %in% c(“Northeast”,”West”) & rate < 1)
- # Use select to show only the state name, the murder rate and the rank
- select(my_states, state, rate, rank)
In one line:
- Filter the “murders” dataste to show:
- states in the Northeast or West
- and the murder rate is less than 1.
- Select only the state name, the rate and the rank
- # show the result and only include the state, rate, and rank columns, all in one line
- filter(murders, region %in% c(“Northeast”,”West”) & rate < 1) %>% select(state, rate, rank)