r_basics Flashcards

1
Q

Load the ‘nycflights’ data frame?

A

data(nycflights)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

View the names of the variables for the ‘nycflights’ data frame?

A

names(nycflights)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

View the names of variables AND data types for the ‘nycflights’ data frame?

A

str(nycflights)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two ways to assign the carrier variable for the nycflights data frame to a variable ‘a’?

A

a <- nycflights$carrier

a <- nycflights[[“carrier”]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Assign a day of the week (“Monday” - “Friday”) to each element in the following vector:

  • poker_vector <- c(140, -50, 20, -120, 240)
A
  • names(poker_vector) <- c(“Monday”, “Tuesday”, “Wednesday”, “Thursday”, “Friday”)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Given the following vector:

  • poker_vector <- c(140, -50, 20, -120, 240)

Assign the middle three values to a new variable “poker_midweek”

A

poker_midweek <- poker_vector[c(2,3,4)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Given the following vector:

  • roulette_vector <- c(-24, -50, 100, -350, 10)
  • Assign to “roulette_selection_vector” the roulette results from Tuesday up to Friday (values 2 - 5)
A

roulette_selecetion_vector <- roulette_vector[2:5]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Given the following:

  • poker_vector <- c(140, -50, 20, -120, 240)
  • days_vector <- c(“Monday”, “Tuesday”, “Wednesday”, “Thursday”, “Friday”)
  • names(poker_vector) <- days_vector

Select the first three elements in poker_vector by using their names: “Monday”, “Tuesday”, and “Wednesday

Assign the result of the selection to poker_start.

A

poker_start <- poker_vector[c(“Monday”, “Tuesday”, “Wednesday”)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Get the length of the variable (column) carriers in the nycflights data frame

A

pop <- nycflights$population

length(pop)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Check if the variables ‘a’ and ‘b’ and identical

A

identical(a,b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In a nested way, determine the number of regions defined by this dataset and contained in murders$region.

A

length(levels(murders$region))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Use the table function in one line of code to create a table showing the number of states per region in the murders data set.

A

table(murders$region)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

View the first five lines of the data frame?

A

head(nycflights)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the seven functions for the ‘dplyr’ package?

A
  • filter()
  • arrange()
  • select()
  • distinct()
  • mutate()
  • summarise()
  • sample_n()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Use ‘ggplot’ function to plot the ‘dep_delay’ variable from the ‘nycflights’ data frame with a bin width of 150

A

ggplot(data = nycflights, aes(x = dep_delay)) + geom_histogram(binwidth = 150)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Use ‘ggplot’ function to plot the ‘dep_delay’ variable for just the ‘dest’ variable with a value of ‘RDU’ from the ‘nycflights’ data frame with a bin width of 150

A

rdu_flights % filter(dest == “RDU”) ggplot(data = rdu_flights, aes(x = dep_delay)) + geom_histogram()

17
Q

Get the mean and standard deviation for the ‘dep_delay’ variable from the ‘rdu_flights’ data set

A

rdu_flights %>% summarise(mean_dd = mean(dep_delay), sd_dd = sd(dep_delay), n = n())

18
Q

Calculate the median and interquartile range for arr_delays of flights in the sfo_feb_flights data frame, grouped by carrier.

A

sfo_feb_flights %>% group_by(carrier) %>% summarise(med_dd = median(arr_delay), iqr_dd = IQR(arr_delay), n = n())

19
Q

Which month has the highest average departure delay from an NYC airport from the nycflights data frame?

A

nycflights %>% group_by(month) %>% summarise(mean_dd = mean(dep_delay)) %>% arrange(desc(mean_dd))

20
Q

Which NYC airport from the nycflights data frame has the best on-time departure percentage?

A

nycflights %>% mutate(dep_type = ifelse(dep_delay < 5, “on time”, “delayed”)) nycflights %>% group_by(origin) %>% summarise(ot_dep_rate = sum(dep_type == “on time”) / n()) %>% arrange(desc(ot_dep_rate))

21
Q

What is the tail number of the lane with the fastest avg_speed from the nycflights data frame?

A

nycflights % mutate(air_time_hr = (air_time/60) nycflights % mutate(avg_speed = (distance/air_time_hr)) nycflights %>% group_by(tailnum) %>% arrange (desc(avg_speed)) %>% select(avg_speed, tailnum)

22
Q

What percent of flights that were “delayed” departing ended up arriving at their destination “on time” from the nycflights data frame?

A

nycflights % mutate(arr_type = ifelse(arr_delay <= 0, “on time”, “delayed”)) nycflights % mutate(dep_type = ifelse(dep_delay < 5, “on time”, “delayed”)) nycflights % summarise(sum((dep_type == “delayed”) & (arr_type == “on time”)) / n())

23
Q

Given the following:

  • poker_vector <- c(140, -50, 20, -120, 240)
  • days_vector <- c(“Monday”, “Tuesday”, “Wednesday”, “Thursday”, “Friday”)
  1. Check which elements in poker_vector are positive (i.e. > 0) and assign this to selection_vector.
  2. Use selection_vector in square brackets to assign the amounts that you won on the profitable days to the variable poker_winning_days.
A
  1. selection_vector <- poker_vector[c(1:5)] > 0
    • OR
    • selection_vector <- poker_vector > 0
  2. poker_winning_days <- poker_vector[selection_vector]