R Flashcards

(68 cards)

1
Q

console

A

where to write R functions and code; doesn’t save the code you wrote

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

environment

A

where you see the objects you’ve created

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

order of operations in R

A

PEDMAS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

print()

A

prints the value stored in an object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

rules for naming identifiers in R

A

must start with letter or period; if it starts with a period, can’t be followed by a digit

reserved words can’t be used as identifiers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

function

A

piece of code that performs a specific tasl

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

arguments should be listed within…

A

parentheses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

data types in R

A

numeric: double or integer (L after the number)
string: character
logical: TRUE or FALSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

typeof()

A

displays the data type of the argument passed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

alphabetical string comparison

how are they compared?

A

dictionary order; assume all in lowercase
if there is a tie when everything is assumed lowercase, lowercase < uppercase
if there is a number, digit < letter

numbers < lowercase letters < uppercase letters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

why is TRUE + FALSE = 1?

A

TRUE is coerced to 1
FALSE is coerced to 0
therefore 1 + 0 = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

implicit coercion

A

R converts data types to be able to accomplish commands

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

AND, OR, NOT

what are the symbols used?

A

&, |, !

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

when can you break a line in R?

A

after , & and %>%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

atomic data

A

object that holds a single value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

vector

can a vector have different data types? what if it’s NA?

A

object that holds multiple values of the same data type; like a column/row array

always the same data type, even if it’s NA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

creating a vector

A

vectorname <- c(element1, element2, element3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
  1. TRUE&TRUE
  2. FALSE&TRUE
  3. TRUE&FALSE
A
  1. TRUE
  2. FALSE
  3. FALSE
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

creating a new vector that is numbers added to an old vector

A

newvector <- oldvector + 2
OR newvector <- oldvector +c(2, 2, 1, 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

length(x)

x is an object

A

outputs the number of elements in X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

subsetting

how are indices numbered?

A

retrieving specific elements from a vector using indices

numbered starting from 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q
  1. retrieve an element from a vector with a given index
  2. retrieve a range of indices
  3. retrieve specific elements with specific indices
  4. retrieve all but index 3
  5. retrieve all but index 2 and 3
A
  1. vectorname[index]
  2. vectorname[index:index]
  3. vectorname[c(1, 2, 4)]
  4. vectorname[-3]
  5. vectorname[-c(2,3)]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

adding new elements to an existing vector

A

vectorname[3:4] <- c(“newvalue1”,”newvalue2”)
assign to locations with no values within an existing vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

which(x)

what is x?

A

gives indices of TRUEs; output is a vector of position numbers

used to identify particular observations that satisfy the condition specified

x is a logical vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
# vectors subsetting the entries that satisfy a condition
column[which(condition)] | ex. cities[which(population > 100000)]
26
max, min, range, sum, mean, sd, var, sqrt, sort
max value, min value, min and max values, sum, average, sd, variance, square root, puts elements in ascending order
27
coercion in vector creation | what is the order?
if elements of a vector are specified in different data types, R will coerce them into 1 data type therefore typeof(vector elements) = highest-ordered data type | logical > integer > double > character
28
how to install a package/library
install.packages(package_name) might need to add dependencies = TRUE, ex for tidyverse | R will download from CRAN
29
how to load a package/library | > means completed
library(package_name)
30
# what is data frame | what is tibble?
data frame = R object that stores a collection of obs for 1 or more variables; a table tibble: allows a collection of vectors to be combined into a data frame
31
accessing a value from a data frame
df[[row, column]] outputs an atomic value df[row, column] outputs another data frame
32
accessing a column of a data frame
new_object <- df$column
33
extract specific items from a column that satisfy a condition in another column
df$column1[which(df$column2 {logical})]
34
modify a single value in a data frame
df$column[which(df$column == identifier)] <- new value
35
calling the items in a column that satisfy a condition, as a vector
df$column {logical} outputs a vector of values within the column that satisfies the condition
36
creating a new column in an existing df | calculation column?
df$new_column <- c(values) | calculation column: df$new_column <- df$column1 / df$column2
37
what functions need na.rm? | does this modify the data
aggregate functions like mean, sum, sd, var, etc | doesn't modify the data, only removes the NAs from the calculation
38
how to identify the NAs in a column | how to find the number of NAs in a column?
is.na(df$column) outputs logical vector | sum(is.na(df$column))
39
loading an RDS file
use the GUI to load the data; executes readRDS by itself
40
select() | syntax
extracts columns specified new_obj <- select(df, variables) View(new_obj) | need new obj so that it doesn't display in interface
41
filter() | syntax; what happens to NAs?
keeps rows/obs where conditions specified are satisfied; only TRUES are kept new_obj <- filter(df, conditions) View(new_obj) can combine conditions with &, | rows where conditions evaluate to NA are dropped
42
combining select and filter with and wihtout pipe
with pipe: df %>% filter(conditions) %>% select(columns) without pipe: select(filter(df, column == condition, columns)
43
mutate() | syntax
adds new columns to a df df <- mutate(df, new_variable = function/operation) | assign to original df so that it doesn't run in console
44
add a new column to obs that meet 2 conditions (and remove NAs), and select 2 columns
df %>% mutate(new_column = operation, na.rm = TRUE) %>% filter(condition1 & condition2) %>% select(columns) | df %>% mutate %>% filter %>% select
45
summarise() | syntax
used to create an aggregate statistic over obs; used with mean, median, sd, n(), n_distinct etc summarise(df, new_var = agg_func(existing_var)) outputs a df | needs existing variables!
46
n() and n_distinct()
counts the number of rows passed into mutate or summarize n_distinct() finds the number of unique rows | often use filter %>% summarise(n()) to find # rows that satisfy filter
47
how to filter out NAs | 2 methods
1. filter(df, !is.na(column)) 2. filter(column>0) since logical comparison filters out obs that are NA
48
group_by
takes an existing data frame and converts it to a grouped dataframe where subsequent ops are performed group by group oftne followed by mutate or summarise
49
when to use group_by + mutate vs group_by + summarize? | difference in how many columns are kept
mutate: for an atomic value function, adding an extra column and assigning values to that column, or an if_else; retains all the columns summarize: aggregate operation over each group and displays one aggregate result per group; removes all the extra columns except those specified in group_by and the extra one for the agg stat
50
how to use group_by with n()
counts the rows in each group separately and summarizes it df %>% group_by(groupvar) %>% summarize(newvar = n())
51
finding how many instances appear 2x in the dataset
df %>% filter %>% group_by(groupvariable) %>% summarize(noftimes = n()) %>% group_by(noftimes) %>% summarize(nofvariables = n()) %>% filter(nofvariables == 2)
52
when to use == vs =
== for logical comparison (filter, if_else) = for assignment
53
if_else | what can it be combined with?
makes conditional assignment based on the logical comparison provided used with mutate and summarise, with or without group_by
54
when to add na.rm?
every aggregate function (sum, mean, var, min, max, etc)
55
arrange() | multiple columns? what is it used with?
orders the rows of the data frame by the variables specified; if multiple columns are specified, the first column is used until a tie, where the second column is used used with select()
56
duplicated() | syntax; how to find number of duplicates in a df?
duplicated(x) where x is the df; returns a logical vector where TRUE = a duplicate of an earlier row sum(duplicated(df)) = finds number of duplicates
57
how to find a duplicated entry in a df and return its value
df %>% filter(duplicated(df)) returns values in console View(df %>% filter(column1 = identifier, ...) returns a shortened df with the duplicated entries
58
how to find the location of a duplicated entry
df %>% filter(duplicated(df)) %>% which()
59
how to take a duplicate out of the dataset
1. find the duplicate using tempdf %>% filter(duplicated(df)) 2. df <- df %>% filter(column1 != identifier | column 2 != identifier)
60
inner_join(x,y) | how are columns matched? what if the column names aren't the same?
joins 2 dfs, and returns all rows from x where there are matching values in y, and all columns from x and y matches columns based on same column names and gives all combinations use the by-argument: by = c("xname" = "yname")
61
left_join(x,y) | syntax; what happens to unmatched entries?
new_df <– left_join(x, y) returns all rows form x where there are matching values in y and all columns from x and y, keeping all obs from x puts unmatched entires in x
62
# inner_join, left_join one-to-one matching vs non-one-to-one matching | how to make non-one-to-one matching work?
one-to-one: unique identifiers; each row in x matches with at most 1 row in y non-one-to-one; no common column name, and a row in x is used with multiple rows in y by-argument: left_join(x, y, by = c("xcolumn" = "ycolumn")
63
multiple columns as matching variables to merge datasets | how to use the by-argument?
if the names are the same, it will match automatically if not, use by = c("x1" = "y1", "x2" = "y2")
64
as.character(), is.character() | what types of objects does it work on?
as.character() converts a numerical obj into a character obj can use atomic values or vectors (all elements get converted) is.character() checks if the obj is a character
65
as.numeric(), is.numeric() | what types of objects does it work on?
as.numeric() converts characters into numeric; works on atomic values or vectors is.numeric() checks if the obj is numeric
66
change an existing df column from character to number
df <- df %>% mutate(column = as.numeric(column) %>% select(columns))
67
export an RDS | syntax
saveRDS(data object, "file name"
68
CSV import and export | difference between CSV and RDS?
import: read_csv("path/url"), or use GUI (same as RDS) export: library(readr), write_csv(data_frame, "file name") CSV is compatible with all languages, lists rows with attributes separated by commas. RDS is specific to R and retains data type while CSV does not