Class 2 Notes Flashcards
(36 cards)
R Studio Software
The console is where you can implement your commands.
However, if you simply write in commands there, it will not save them. It is thus generally better to write your commands into the Source window, highlight the code that you want to run, and click on CTRL and Enter at the same time to execute your commands.
The Environment panel is where you will look at your datasets, etc. You will use the
Files, Plots, Packages, Help, and Viewer panes a lot.
Setting the Working Directory
Clear environment: rm(list=ls(all=TRUE))
Set the working directory setwd(“C:/Quant)
Create a new project
File -> New Project: choose directory
To enter a vector for country
country = c(“france”,”france”,”france”,”france”,”france”, “france”)
Vector for year
year = c(2000,2005,2010,2015,2020,2025) or year2 <- seq(2000, 2025, 5)
Inspect vector
print(vector name)
Remove vector or data frame
rm(low_high)
To summarize, we have created vectors of different classes/types, which we can check
class(country) - character
class(year) - numeric
class(poverty_levels) - factor
Turn vectors into data frame
df = data.frame(country,year,poverty_rate,poverty_levels)
Run a library
Library()
Import dataframe
df = import(“data/france_poverty.csv”) # csv
df = import(“data/france_poverty.Rdata”) # R data file
df = import(“data/france_poverty.dta”) # Stata data file
df = import(“data/france_poverty.xlsx”, which=1) # Excel
Inspect Data
View(df) # look at the whole thing
head(df) # get a quick view of the first few observations
dim(df) # check how many variable/dimensions (6 observations, 4 variables)
length(df$poverty_rate) # count the number of observations
unique(df$poverty_rate) # list unique values
table(df$country) # inspect variable
Summary statistics
summary(df)
To take into account that something has missing values (NA) by adding na.rm=TRUE
add na.rm=TRUE to the end of our command
Mean
mean(df$poverty_rate) or mean(df)
Standard Deviation
sd(df$poverty_rate) or sd(df)
Variance
var(df$poverty_rate) or var(df)
Calculate mean
sample mean = ̄ 𝑥 = (∑ 𝑥𝑖)/𝑁 = (2 + 4 + 6)/3 = 12/3 = 4
Calculate sample variance
sample variance = (∑ (𝑥𝑖 − ̄ 𝑥)2)/(𝑁 − 1)
Calculate sample standard deviation
sample standard deviation = √((∑ (𝑥𝑖 − ̄ 𝑥)2)/(𝑁 − 1))
Drop NA
library(tidyverse)
df <-
df %>%
drop_na(poverty_rate)
Piping
The pipe command, %>%, is telling R to use whatever is in this line of code as the basis for the next line of code.
Nice table in R
library(modelsummary)
datasummary_skim(df) or
library(modelsummary)
datasummary_skim(df, output = “data.frame”) or
library(modelsummary)
datasummary(poverty_rate + year ~
NUnique + PercentMissing + Mean + SD + Var +
Min + Median + Max,
data = df)
Every ggplot2 has
ggplot(data = , aes(x = , y = )) +
and geom_()