R Flashcards

(110 cards)

1
Q

argument

A

(r) information that a function needs in order to run

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

variable

A

representation of a value in R that can be stored for use later during programming (can also be called OBJECT)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

vector

A

a group of data elements of the same type stored in a sequence in R

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Pipe

A

a tool in R for expressing a sequence of multiple operations, represented with “%>%”; takes the output of one statement and makes it the input of the next statement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The 4 types of Vectors

A

logical (TRUE, FALSE), character (words), integer (1L, 2L, 3L), double (2.5, 4.561)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

create a data frame

A

data.frame(x=c(1,2,3), y=c(1.4, 5.4, 10.4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

create a new folder

A

dire.create (“destination_folder”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

create a file

A

file.create(“new_word_file.docx”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

copy a file

A

file.copy (“new_text_file.txt”, “destination_folder”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

OR operator

A

I or II

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

NOT operator

A

!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

common function to preview data (1st 6 rows)

A

head()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

these functions return summary - high level view of each column in your data arranged horizontally

A

str()- horizontal summary, and glimpse()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

function for returning a list of column names from dataset

A

colnames()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

renaming a column

A

rename(diamonds, carat_new = carat, cut_new = cut)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

summarizing your data

A

summarize(diamonds, mean_carat = mean(carat))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

separates plots by a charactaristic

A

+ facet_wrap(~cut)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

code for using diamonds dataset, plotting x axis carat, , y axis price, and dots are colored differently for different cuts, scatter plot, different plots for different cuts

A

ggplot(data = diamonds, aes(x = carat, y = price, color = cut)) +
geom_point() +
facet_wrap (~cut)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

packages (R)

A

units of reproducible R code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

vignette

A

documentation that acts asa guide to an R package

browseVignettes()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

filter by vitamin c dose 0.5

A

filtered_tg

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

sort by tooth length (after a filter)

A

arrange(filtered_tg, len)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Pipe operator shortcut

A

ctrl + shift + m

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

switch between a date-time to a date

A

as_date() (in the lubridate package)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
data frame
collection of columns
26
tibbles
dataframes in the tidyverse you can't change the type of info (number - string)
27
how to add a column to a dataframe
mutate(dataframe, column_new = column*100)
28
install tidyverse
install.packages("tidyverse")
29
after you're done installing tidyverse, what is the next step?
load it: library(tidyverse)
30
Tibbles
only pull up first 10 rows of a dataset. Never change the names of your variables, or the data types of your inputs. Part of tidyverse
31
how to read a csv file
read_csv()
32
import "hotel_bookings.csv" into R and save it as a data frame titled 'bookings_df'
bookings_df
33
if you want to create another (smaller) data frame from the existing dataframe (for example wit hthe "adr" and "adults" columns of the bookings_df dataframe).
new_df
34
add a column to the dataframe: total = adr/adults
mutate(new_df, total= 'adr'/adultsread
35
skimr package
makes summarizing data really easy, lets you skim through it more quickly
36
janitor package
has functions for cleaning data
37
functions to get summaries of our dataframes
skim_without_charts(), glimpse(), head(), str(), select()
38
packages that simplify data cleaning tasks
skimr and janitor
39
select()
specifies certain columns or excludes columns
40
if you want all the columns in the penguins dataset EXCEPT the species column
penguins %>% | select( - species)
41
rename a column (in penguins dataset)
penguins %>% | rename(island_new = island)
42
make all columns uppercase (or lowercase)
rename_with(penguins, toupper) (or tolower)
43
clean_names()
ensures only characters, numbers and underscores in the names
44
%%
returns remainder after division
45
%/%
returns an integer value after division (5%/%2=2)
46
4 kinds of operators
arithmetic, relational, logical, assignment
47
^
exponent
48
equal to
==
49
not equal to
!=
50
&&
compares only first numbers in the vectors (x
51
!
logical NOT
52
arrange()
chooses what variable you want to sort by
53
sort by bill length (penguins) in descending order
penguins %>% | arrange( - bill_length)
54
create a dataframe
assigning a name to something
55
view dataframe
View()
56
putting similar values together in a column
group_by()
57
leave the missing values out
drop_na( )
58
Get averages (or max values) of bill length per island penguins
penguins %>% group_by(island) %>% summarize (mean_bill_length_mm = mean (bill_length_mm)) (or replace mean with max)
59
get max and mean bill length for each species by island.
penguins %>% group_by(species, island) %>% | summarize(max_bl=max(bill_length_mm), mean_bl = mean(bill_length_mm)
60
only view Adelie penguins
penguins %>% | filter (species == Adelie)
61
data cleaning packages
install.packages(tidyverse, skimr, janitor
62
import and save csv file "hotel bookings" as a dataframe
bookings_df
63
view only certain columns from a dataframe
trimmed_df
64
cleaning functions
1. rename: (to rename columns) dataframe %>% rename(column_new = column) 2. unite: dataframe %>% unite (column1_2, c("column1", "column2"), sep = " ") 3. mutate: (adds a column) dataframe % mutate(guests = babies+children+adults) 4. summarize (newcolumn= mean(column), newcolumn1 = sum(column1)
65
transform data with these functions
separate( ) unite ( ) mutate ( )
66
separate( ) syntax
separate( dataframe, column, into = c(newcolumn1, newcolumn2), sep = " ")
67
unite( ) syntax
unite (dataframe, "newcolumn", column1, column2,
68
mutate( ) syntax
dataframe %>% | mutate(new_column = column/1000, new_column2 = column2/1000)
69
Convert data from wide to long or long to wide
pivot_longer( ), pivot_wider( )
70
makes sure column names are unique and consistent
clean_names( )
71
bias function (package, syntax)
SimDesign package, bias(actual, predicted)
72
sort hotel_bookings columns by lead time (most to least)
arrange(hotel_bookings, desc(lead_time))
73
how to find max & min lead time in hotel_bookings
max(hotel_bookings$lead_time) | min (hotel_bookings$lead_time)
74
average lead time in hotel_bookings
mean(hotel_bookings$lead_time)
75
Filter syntax into a "new_hotel_dataframe"
new_hotel_dataframe
76
find min/max/mean lead times at the two hotels, call it "hotel_summary"
hotel_summary % group_by (hotel) %>% summarise (average_lead_time = mean(lead_time) max_lead_time = max(lead_time) min_lead_time = min (lead_time)
77
functions that let you change your data
arrange( ), group_by( ), filter( )
78
making columns lower (or upper)case
rename_with(dataframe, tolower)
79
core concepts in ggplot2
aesthetics, geoms, facets, labels, and annotations
80
view palmerpenguins dataset
install.packages("palmerpenguins") library("palmerpenguins") data(penguins) View(penguins)
81
two different geoms
geom_point and geom_bar
82
geom_point argument for flipper length as xaxis, and body mass g as yaxis
ggplot(data=penguins) + geom_point(mapping = aes(x=flipper_length_mm, y=body_mass_g))
83
geom
a geometric object used to represent your data (points, bars, lines and more)
84
aesthetic
a visual property of an object in your plot (position, color, shape or size)
85
mapping
matching up a specific variable in your dataset with a specific aesthetic
86
3 steps to plot a graph
1. start with ggplot function and choose a dataset 2. add a geom_ function to display your data 3. map the variables you want to plot in the arguments of the aes( ) function
87
what other aesthetics can you add to variables
x,y, color, shape, size, alpha (transparency)
88
this geom shows general trends in data
geom_smooth
89
this aesthetic breaks out geom_smooth into pieces
linetype = (species)
90
this geom creates a little noise around each point
geom_jitter
91
When using geom_bar, the color aesthetic will...
only put outlines of the color around the bars, the "fill" aesthetic will fill in the color
92
data smoothing for plots with less than 1000 points
ggplot(data, aes(x= , y= )) + geom_point() + geom_smooth (method = "loess")
93
data smoothing for plots with more than 1000 points
ggplot(data, aes(x= , y= )+ geom_point() + geom_smooth (method = "gam", ...)
94
facets
let you display smaller groups, or subsets, of your data
95
2 types of facets
facet_wrap, facet_grid
96
facet_wrap(~species)
let's us create a separate plot for each species
97
allows you to facet your plot with two variables;
facet_grid | vertically by the first variable, and horizontally by the second variable
98
~
tilda symbol
99
what rotates text 45 degrees to make it easier to read?
theme(axis.text.x = element_text(angle = 45)
100
how to add a label
labs(title="Palmer Penguins", subtitle="3 Species", caption = "collected by Dr.")
101
text INSIDE the grid of the plot
annotate function
102
"annotate" function syntax with font, size and tilt
annotate("text", x=50, y=50, label= "The largest", fontface="bold", size=4.5, angle=25)
103
how to save a plot (2 ways)
1. Explort | 2. ggsave("---.png")
104
find earliest year in hotel_bookings
min(hotel_bookings$arrival_date_year)
105
paste0
subtitle=paste0("Data from: ", mindate, " to ", maxdate))
106
ggsave syntax
ggsave("---.png", width=7, height=7)
107
R Markdown
file format for making dynamic documents with R
108
Markdown
a syntax for formatting plain text files
109
R Notebook
lets users run your code and show tha graphs and charts that visualize the code
110
HTML
The set of markup symbols or codes used to create a webpage