r_basics Flashcards
(23 cards)
Load the ‘nycflights’ data frame?
data(nycflights)
View the names of the variables for the ‘nycflights’ data frame?
names(nycflights)
View the names of variables AND data types for the ‘nycflights’ data frame?
str(nycflights)
What are the two ways to assign the carrier variable for the nycflights data frame to a variable ‘a’?
a <- nycflights$carrier
a <- nycflights[[“carrier”]]
Assign a day of the week (“Monday” - “Friday”) to each element in the following vector:
- poker_vector <- c(140, -50, 20, -120, 240)
- names(poker_vector) <- c(“Monday”, “Tuesday”, “Wednesday”, “Thursday”, “Friday”)
Given the following vector:
- poker_vector <- c(140, -50, 20, -120, 240)
Assign the middle three values to a new variable “poker_midweek”
poker_midweek <- poker_vector[c(2,3,4)]
Given the following vector:
- roulette_vector <- c(-24, -50, 100, -350, 10)
- Assign to “roulette_selection_vector” the roulette results from Tuesday up to Friday (values 2 - 5)
roulette_selecetion_vector <- roulette_vector[2:5]
Given the following:
- poker_vector <- c(140, -50, 20, -120, 240)
- days_vector <- c(“Monday”, “Tuesday”, “Wednesday”, “Thursday”, “Friday”)
- names(poker_vector) <- days_vector
Select the first three elements in poker_vector by using their names: “Monday”, “Tuesday”, and “Wednesday
Assign the result of the selection to poker_start.
poker_start <- poker_vector[c(“Monday”, “Tuesday”, “Wednesday”)]
Get the length of the variable (column) carriers in the nycflights data frame
pop <- nycflights$population
length(pop)
Check if the variables ‘a’ and ‘b’ and identical
identical(a,b)
In a nested way, determine the number of regions defined by this dataset and contained in murders$region.
length(levels(murders$region))
Use the table function in one line of code to create a table showing the number of states per region in the murders data set.
table(murders$region)
View the first five lines of the data frame?
head(nycflights)
What are the seven functions for the ‘dplyr’ package?
- filter()
- arrange()
- select()
- distinct()
- mutate()
- summarise()
- sample_n()
Use ‘ggplot’ function to plot the ‘dep_delay’ variable from the ‘nycflights’ data frame with a bin width of 150
ggplot(data = nycflights, aes(x = dep_delay)) + geom_histogram(binwidth = 150)
Use ‘ggplot’ function to plot the ‘dep_delay’ variable for just the ‘dest’ variable with a value of ‘RDU’ from the ‘nycflights’ data frame with a bin width of 150
rdu_flights % filter(dest == “RDU”) ggplot(data = rdu_flights, aes(x = dep_delay)) + geom_histogram()
Get the mean and standard deviation for the ‘dep_delay’ variable from the ‘rdu_flights’ data set
rdu_flights %>% summarise(mean_dd = mean(dep_delay), sd_dd = sd(dep_delay), n = n())
Calculate the median and interquartile range for arr_delay
s of flights in the sfo_feb_flights
data frame, grouped by carrier.
sfo_feb_flights %>% group_by(carrier) %>% summarise(med_dd = median(arr_delay), iqr_dd = IQR(arr_delay), n = n())
Which month has the highest average departure delay from an NYC airport from the nycflights data frame?
nycflights %>% group_by(month) %>% summarise(mean_dd = mean(dep_delay)) %>% arrange(desc(mean_dd))
Which NYC airport from the nycflights data frame has the best on-time departure percentage?
nycflights %>% mutate(dep_type = ifelse(dep_delay < 5, “on time”, “delayed”)) nycflights %>% group_by(origin) %>% summarise(ot_dep_rate = sum(dep_type == “on time”) / n()) %>% arrange(desc(ot_dep_rate))
What is the tail number of the lane with the fastest avg_speed from the nycflights data frame?
nycflights % mutate(air_time_hr = (air_time/60) nycflights % mutate(avg_speed = (distance/air_time_hr)) nycflights %>% group_by(tailnum) %>% arrange (desc(avg_speed)) %>% select(avg_speed, tailnum)
What percent of flights that were “delayed” departing ended up arriving at their destination “on time” from the nycflights data frame?
nycflights % mutate(arr_type = ifelse(arr_delay <= 0, “on time”, “delayed”)) nycflights % mutate(dep_type = ifelse(dep_delay < 5, “on time”, “delayed”)) nycflights % summarise(sum((dep_type == “delayed”) & (arr_type == “on time”)) / n())
Given the following:
- poker_vector <- c(140, -50, 20, -120, 240)
- days_vector <- c(“Monday”, “Tuesday”, “Wednesday”, “Thursday”, “Friday”)
- Check which elements in poker_vector are positive (i.e. > 0) and assign this to selection_vector.
- Use selection_vector in square brackets to assign the amounts that you won on the profitable days to the variable poker_winning_days.
- selection_vector <- poker_vector[c(1:5)] > 0
- OR
- selection_vector <- poker_vector > 0
- poker_winning_days <- poker_vector[selection_vector]