r_sorting Flashcards
1
Q
How do you do the following:
- create a vector composed of (31, 4, 15, 92, 65)
- put the vector in ascending order
A
- create vector: x <- c(31, 4, 15, 92, 65)
- sort: sort(x)
2
Q
How do you get the order of indexes (from smallest value to largest value) of a data frame?
A
- order(x)
- Example:
- > x
[1] 31 4 15 92 65
> order(x)
[1] 2 3 1 5 4 - The “order” function displays index order based on numerical values from smallest to largest
- > x
3
Q
How do you do the following:
- Store the order of the variable “total” from the “murders” data frame within the variable “index”
- Display the “abb” variable from the “murders” data frame by “index”
A
- index <- order(murders$total)
- murders$abb[index]
4
Q
How do you do the following:
- Display the largest value within the “total” variable/column from the “murders” data set
- Display the index (and store in the variable “i_max”) of the largest value within the “total” variable/column from the “murders” data set
- Display the “state” name of the value at the index stored in “i_max”
A
- max(murders$total)
- i_max <- which.max(murders$total)
- murders$state[i_max]
5
Q
Given the vector (31, 4, 15, 92, 65), answer the following:
- What is the “sort” of the vector?
- What is the “order” of the vector?
- What is the “rank” of the vector?
A
- original: 31, 4, 15, 92, 65
- sort: 4, 15, 31, 65, 92
-
order: 2, 3, 1, 5, 4
- This is the index of the original vector (before being sorted)
- rank: 3, 1, 2, 5, 4
6
Q
Using the “murders” dataset, do the following:
- Use the $ operator to access the population size data and store it the object pop.
- Then use the sort function to redefine pop so that it is sorted.
- Finally use the [ operator to report the smallest population size.
A
- pop <- murders$population
- pop <- sort(pop)
- pop[1]
7
Q
Using the “murders” data set, do the following:
- Now instead of the smallest population size, let’s find out the row number, in the data frame murders, of the state with the smallest population size.
- This time we need to replace the order() instead of sort().
- Remember that the entries in the vector murders$population follow the order of the rows of murders.
A
- # Access population from the dataset and store it in pop
- pop <- murders$population
- # Use the command order, to order pop and store in object o
- o <- order(pop)
- # Find the index number of the entry with the smallest population size
- o[1]
8
Q
Using the “murders” data set, do the following:
- Write one line of code that gives the index of the lowest population entry. Use the which.min command.
A
- which.min(murders$population)
9
Q
Using the “murders” data set, do the following:
- Find the index of the smallest state using which.min(murders$population).
- Define a variable states to hold the state names from the murders data frame.
- Combine these to find the state name for the smallest state
A
- # Define the variable i to be the index of the smallest state
- i <- which.min(murders$population)
- # Define variable states to hold the states
- states <- murders$state
- # Use the index you just defined to find the state with the smallest population
- murders$state[i]
10
Q
Using the “murders” data set, do the following:
- Define a variable states to be the state names from murders
- Use rank(murders$population) to determine the population size rank (from smallest to biggest) of each state.
- Save these ranks in an object called ranks.
- Create a data frame with state names and their respective ranks. Call the data frame my_df.
A
- # Define a variable states to be the state names
- states <- murders$state
- # Define a variable ranks to determine the population size ranks
- ranks <- rank(murders$population)
- # Create a data frame my_df with the state name and its rank
- my_df <- data.frame(name = states, ranks = ranks)
- my_df
11
Q
Using the “murders” data set, do the following:
- Create variables states and ranks to store the state names and ranks by population size respectively.
- Create an object ind that stores the indexes needed to order the population values, using the order command. For example we could define o <- order(murders$population)
- Create a data frame with both variables following the correct order. Use the bracket operator [to re-order each column in the data frame. For example, states[o] orders the abbreviations based by population size.
- The columns of the data frame must be in the specific order: state, rate, rank.
A
- # Define a variable states to be the state names from the murders data frame
- states <- murders$state
- # Define a variable ranks to determine the population size ranks
- ranks <- rank(murders$population)
- # Define a variable ind to store the indexes needed to order the population values
- ind <- order(murders$population)
- # Create a data frame my_df with the state name and its rank and ordered from least populous to most
- my_df <- data.frame(state=states[ind], rank=ranks[ind])
12
Q
Do the following:
- Import the “dslabs” library
- Import the “na_example” dataset
- Check the structure of the “na_example” dataset
- Find the mean of the na_example dataset
- The is.na returns a logical vector that tells us which entries are NA. Assign the logical vector that is returned by is.na(na_example) to an object called ind.
- Determine how many NAs na_example has, using the sum command.
- Write one line of code to compute the average, but only for the entries that are not NA making use of the ! operator before ind.
A
- # Using new dataset
- library(dslabs)
- data(na_example)
- # Checking the structure
- str(na_example)
- # Find out the mean of the entire dataset
- mean(na_example)
- # Use is.na to create a logical index ind that tells which entries are NA
- ind <- is.na(na_example)
- # Determine how many NA ind has using the sum function
- sum(ind)
- # Compute the average, for entries of na_example that are not NA
- mean(na_example[!ind])