r_basics Flashcards

Question 1

Q

Load the ‘nycflights’ data frame?

Answer

A

data(nycflights)

Question 2

Q

View the names of the variables for the ‘nycflights’ data frame?

Answer

A

names(nycflights)

Question 3

Q

View the names of variables AND data types for the ‘nycflights’ data frame?

Answer

A

str(nycflights)

Question 4

Q

What are the two ways to assign the carrier variable for the nycflights data frame to a variable ‘a’?

Answer

A

a <- nycflights$carrier

a <- nycflights[[“carrier”]]

Question 5

Q

Assign a day of the week (“Monday” - “Friday”) to each element in the following vector:

poker_vector <- c(140, -50, 20, -120, 240)

Answer

A

names(poker_vector) <- c(“Monday”, “Tuesday”, “Wednesday”, “Thursday”, “Friday”)

Question 6

Q

Given the following vector:

poker_vector <- c(140, -50, 20, -120, 240)

Assign the middle three values to a new variable “poker_midweek”

Answer

A

poker_midweek <- poker_vector[c(2,3,4)]

Question 7

Q

Given the following vector:

roulette_vector <- c(-24, -50, 100, -350, 10)
Assign to “roulette_selection_vector” the roulette results from Tuesday up to Friday (values 2 - 5)

Answer

A

roulette_selecetion_vector <- roulette_vector[2:5]

Question 8

Q

Given the following:

poker_vector <- c(140, -50, 20, -120, 240)
days_vector <- c(“Monday”, “Tuesday”, “Wednesday”, “Thursday”, “Friday”)
names(poker_vector) <- days_vector

Select the first three elements in poker_vector by using their names: “Monday”, “Tuesday”, and “Wednesday

Assign the result of the selection to poker_start.

Answer

A

poker_start <- poker_vector[c(“Monday”, “Tuesday”, “Wednesday”)]

Question 9

Q

Get the length of the variable (column) carriers in the nycflights data frame

Answer

A

pop <- nycflights$population

length(pop)

Question 10

Q

Check if the variables ‘a’ and ‘b’ and identical

Answer

A

identical(a,b)

Question 11

Q

In a nested way, determine the number of regions defined by this dataset and contained in murders$region.

Answer

A

length(levels(murders$region))

Question 12

Q

Use the table function in one line of code to create a table showing the number of states per region in the murders data set.

Answer

A

table(murders$region)

Question 13

Q

View the first five lines of the data frame?

Answer

A

head(nycflights)

Question 14

Q

What are the seven functions for the ‘dplyr’ package?

Answer

A

filter()
arrange()
select()
distinct()
mutate()
summarise()
sample_n()

Question 15

Q

Use ‘ggplot’ function to plot the ‘dep_delay’ variable from the ‘nycflights’ data frame with a bin width of 150

Answer

A

ggplot(data = nycflights, aes(x = dep_delay)) + geom_histogram(binwidth = 150)

Question 16

Q

Use ‘ggplot’ function to plot the ‘dep_delay’ variable for just the ‘dest’ variable with a value of ‘RDU’ from the ‘nycflights’ data frame with a bin width of 150

Answer

Study These Flashcards

A

rdu_flights % filter(dest == “RDU”) ggplot(data = rdu_flights, aes(x = dep_delay)) + geom_histogram()

Question 17

Q

Get the mean and standard deviation for the ‘dep_delay’ variable from the ‘rdu_flights’ data set

Answer

Study These Flashcards

A

rdu_flights %>% summarise(mean_dd = mean(dep_delay), sd_dd = sd(dep_delay), n = n())

Question 18

Q

Calculate the median and interquartile range for arr_delays of flights in the sfo_feb_flights data frame, grouped by carrier.

Answer

Study These Flashcards

A

sfo_feb_flights %>% group_by(carrier) %>% summarise(med_dd = median(arr_delay), iqr_dd = IQR(arr_delay), n = n())

Question 19

Q

Which month has the highest average departure delay from an NYC airport from the nycflights data frame?

Answer

Study These Flashcards

A

nycflights %>% group_by(month) %>% summarise(mean_dd = mean(dep_delay)) %>% arrange(desc(mean_dd))

Question 20

Q

Which NYC airport from the nycflights data frame has the best on-time departure percentage?

Answer

Study These Flashcards

A

nycflights %>% mutate(dep_type = ifelse(dep_delay < 5, “on time”, “delayed”)) nycflights %>% group_by(origin) %>% summarise(ot_dep_rate = sum(dep_type == “on time”) / n()) %>% arrange(desc(ot_dep_rate))

Question 21

Q

What is the tail number of the lane with the fastest avg_speed from the nycflights data frame?

Answer

Study These Flashcards

A

nycflights % mutate(air_time_hr = (air_time/60) nycflights % mutate(avg_speed = (distance/air_time_hr)) nycflights %>% group_by(tailnum) %>% arrange (desc(avg_speed)) %>% select(avg_speed, tailnum)

Question 22

Q

What percent of flights that were “delayed” departing ended up arriving at their destination “on time” from the nycflights data frame?

Answer

Study These Flashcards

A

nycflights % mutate(arr_type = ifelse(arr_delay <= 0, “on time”, “delayed”)) nycflights % mutate(dep_type = ifelse(dep_delay < 5, “on time”, “delayed”)) nycflights % summarise(sum((dep_type == “delayed”) & (arr_type == “on time”)) / n())

Question 23

Q

Given the following:

poker_vector <- c(140, -50, 20, -120, 240)
days_vector <- c(“Monday”, “Tuesday”, “Wednesday”, “Thursday”, “Friday”)

Check which elements in poker_vector are positive (i.e. > 0) and assign this to selection_vector.
Use selection_vector in square brackets to assign the amounts that you won on the profitable days to the variable poker_winning_days.

Answer

Study These Flashcards

A

selection_vector <- poker_vector[c(1:5)] > 0
- OR
- selection_vector <- poker_vector > 0
poker_winning_days <- poker_vector[selection_vector]

r_basics Flashcards

(23 cards)