Class 2 Notes Flashcards

(36 cards)

1
Q

R Studio Software

A

The console is where you can implement your commands.
However, if you simply write in commands there, it will not save them. It is thus generally better to write your commands into the Source window, highlight the code that you want to run, and click on CTRL and Enter at the same time to execute your commands.
The Environment panel is where you will look at your datasets, etc. You will use the
Files, Plots, Packages, Help, and Viewer panes a lot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Setting the Working Directory

A

Clear environment: rm(list=ls(all=TRUE))
Set the working directory setwd(“C:/Quant)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Create a new project

A

File -> New Project: choose directory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

To enter a vector for country

A

country = c(“france”,”france”,”france”,”france”,”france”, “france”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Vector for year

A

year = c(2000,2005,2010,2015,2020,2025) or year2 <- seq(2000, 2025, 5)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Inspect vector

A

print(vector name)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Remove vector or data frame

A

rm(low_high)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

To summarize, we have created vectors of different classes/types, which we can check

A

class(country) - character
class(year) - numeric
class(poverty_levels) - factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Turn vectors into data frame

A

df = data.frame(country,year,poverty_rate,poverty_levels)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Run a library

A

Library()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Import dataframe

A

df = import(“data/france_poverty.csv”) # csv
df = import(“data/france_poverty.Rdata”) # R data file
df = import(“data/france_poverty.dta”) # Stata data file
df = import(“data/france_poverty.xlsx”, which=1) # Excel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Inspect Data

A

View(df) # look at the whole thing
head(df) # get a quick view of the first few observations
dim(df) # check how many variable/dimensions (6 observations, 4 variables)
length(df$poverty_rate) # count the number of observations
unique(df$poverty_rate) # list unique values
table(df$country) # inspect variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Summary statistics

A

summary(df)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

To take into account that something has missing values (NA) by adding na.rm=TRUE

A

add na.rm=TRUE to the end of our command

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Mean

A

mean(df$poverty_rate) or mean(df)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Standard Deviation

A

sd(df$poverty_rate) or sd(df)

14
Q

Variance

A

var(df$poverty_rate) or var(df)

15
Q

Calculate mean

A

sample mean = ̄ 𝑥 = (∑ 𝑥𝑖)/𝑁 = (2 + 4 + 6)/3 = 12/3 = 4

16
Q

Calculate sample variance

A

sample variance = (∑ (𝑥𝑖 − ̄ 𝑥)2)/(𝑁 − 1)

17
Q

Calculate sample standard deviation

A

sample standard deviation = √((∑ (𝑥𝑖 − ̄ 𝑥)2)/(𝑁 − 1))

18
Q

Drop NA

A

library(tidyverse)
df <-
df %>%
drop_na(poverty_rate)

19
Q

Piping

A

The pipe command, %>%, is telling R to use whatever is in this line of code as the basis for the next line of code.

20
Q

Nice table in R

A

library(modelsummary)
datasummary_skim(df) or
library(modelsummary)
datasummary_skim(df, output = “data.frame”) or
library(modelsummary)
datasummary(poverty_rate + year ~
NUnique + PercentMissing + Mean + SD + Var +
Min + Median + Max,
data = df)

21
Q

Every ggplot2 has

A

ggplot(data = , aes(x = , y = )) +
and geom_()

22
Line graph
ggplot(data = df, aes(x=year, y=poverty_rate)) + geom_line()
23
Fancy Line Graph
ggplot(data = df, aes(x=year, y=poverty_rate)) + geom_line() + labs( x = "Year", y = "Poverty Rate", title = "Poverty Rate in France") + theme(plot.title = element_text(hjust = 0.5))
24
Bar Graph
ggplot(df, aes(x=year, y=poverty_rate)) + geom_bar(stat = "identity") + labs( x = "Year", y = "Poverty Rate", title = "Poverty Rate in France") + theme(plot.title = element_text(hjust = 0.5))
25
Stat identity
The only major change would be stat=identity inside the parentheses to our geom_bar(). That stat=identity specification is necessary because, by default, geom_bar() just assumes that you want a count of the number of rows.
26
Drop NA
df <- df %>% dplyr::select(-c(poverty_levels))
27
Look at data set
print(df)
28
Put Data into wide format
wide = df %>% pivot_wider(names_from = "year", values_from = c("poverty_rate"))
29
Put data into long format
long <- wide %>% pivot_longer(cols = c(`2000`, `2005`, `2010`, `2015`, `2020`), names_to = "year", values_to = "poverty_rate")
30
Summarize mean variable
collapse <- df %>% summarize(mean_poverty_rate = mean(poverty_rate)) print(collapse)
31
Summarize
collapsed_group <- df %>% group_by(country) %>% summarize(mean_poverty_rate = mean(poverty_rate)) %>% ungroup()
32
ifelse()
It presents a condition (e.g. 𝑥 = 1), and tells the computer to replace the value with one thing if the condition is true or something else if the condition is false.
33
ifelse() example
df$decade2020s = ifelse(df$year >=2020,1,0) print(df)