Class 2 Notes Flashcards by Allison B

R Studio Software

The console is where you can implement your commands.
However, if you simply write in commands there, it will not save them. It is thus generally better to write your commands into the Source window, highlight the code that you want to run, and click on CTRL and Enter at the same time to execute your commands.
The Environment panel is where you will look at your datasets, etc. You will use the
Files, Plots, Packages, Help, and Viewer panes a lot.

How well did you know this?

Not at all

Perfectly

Setting the Working Directory

Clear environment: rm(list=ls(all=TRUE))
Set the working directory setwd(“C:/Quant)

How well did you know this?

Not at all

Perfectly

Create a new project

File -> New Project: choose directory

How well did you know this?

Not at all

Perfectly

To enter a vector for country

country = c(“france”,”france”,”france”,”france”,”france”, “france”)

How well did you know this?

Not at all

Perfectly

Vector for year

year = c(2000,2005,2010,2015,2020,2025) or year2 <- seq(2000, 2025, 5)

How well did you know this?

Not at all

Perfectly

Inspect vector

print(vector name)

How well did you know this?

Not at all

Perfectly

Remove vector or data frame

rm(low_high)

How well did you know this?

Not at all

Perfectly

To summarize, we have created vectors of different classes/types, which we can check

class(country) - character
class(year) - numeric
class(poverty_levels) - factor

How well did you know this?

Not at all

Perfectly

Turn vectors into data frame

df = data.frame(country,year,poverty_rate,poverty_levels)

How well did you know this?

Not at all

Perfectly

Run a library

Library()

How well did you know this?

Not at all

Perfectly

Import dataframe

df = import(“data/france_poverty.csv”) # csv
df = import(“data/france_poverty.Rdata”) # R data file
df = import(“data/france_poverty.dta”) # Stata data file
df = import(“data/france_poverty.xlsx”, which=1) # Excel

How well did you know this?

Not at all

Perfectly

Inspect Data

View(df) # look at the whole thing
head(df) # get a quick view of the first few observations
dim(df) # check how many variable/dimensions (6 observations, 4 variables)
length(df$poverty_rate) # count the number of observations
unique(df$poverty_rate) # list unique values
table(df$country) # inspect variable

How well did you know this?

Not at all

Perfectly

Summary statistics

summary(df)

How well did you know this?

Not at all

Perfectly

To take into account that something has missing values (NA) by adding na.rm=TRUE

add na.rm=TRUE to the end of our command

How well did you know this?

Not at all

Perfectly

Mean

mean(df$poverty_rate) or mean(df)

How well did you know this?

Not at all

Perfectly

Standard Deviation

Study These Flashcards

sd(df$poverty_rate) or sd(df)

Variance

Study These Flashcards

var(df$poverty_rate) or var(df)

Calculate mean

Study These Flashcards

sample mean = ̄ 𝑥 = (∑ 𝑥𝑖)/𝑁 = (2 + 4 + 6)/3 = 12/3 = 4

Calculate sample variance

Study These Flashcards

sample variance = (∑ (𝑥𝑖 − ̄ 𝑥)2)/(𝑁 − 1)

Calculate sample standard deviation

Study These Flashcards

sample standard deviation = √((∑ (𝑥𝑖 − ̄ 𝑥)2)/(𝑁 − 1))

Drop NA

Study These Flashcards

library(tidyverse)
df <-
df %>%
drop_na(poverty_rate)

Piping

Study These Flashcards

The pipe command, %>%, is telling R to use whatever is in this line of code as the basis for the next line of code.

Nice table in R

Study These Flashcards

library(modelsummary)
datasummary_skim(df) or
library(modelsummary)
datasummary_skim(df, output = “data.frame”) or
library(modelsummary)
datasummary(poverty_rate + year ~
NUnique + PercentMissing + Mean + SD + Var +
Min + Median + Max,
data = df)

Every ggplot2 has

Study These Flashcards

ggplot(data = , aes(x = , y = )) +
and geom_()

Line graph

ggplot(data = df, aes(x=year, y=poverty_rate)) + geom_line()

Fancy Line Graph

ggplot(data = df, aes(x=year, y=poverty_rate)) + geom_line() + labs( x = "Year", y = "Poverty Rate", title = "Poverty Rate in France") + theme(plot.title = element_text(hjust = 0.5))

Bar Graph

ggplot(df, aes(x=year, y=poverty_rate)) + geom_bar(stat = "identity") + labs( x = "Year", y = "Poverty Rate", title = "Poverty Rate in France") + theme(plot.title = element_text(hjust = 0.5))

Stat identity

The only major change would be stat=identity inside the parentheses to our geom_bar(). That stat=identity specification is necessary because, by default, geom_bar() just assumes that you want a count of the number of rows.

Drop NA

df <- df %>% dplyr::select(-c(poverty_levels))

Look at data set

print(df)

Put Data into wide format

wide = df %>% pivot_wider(names_from = "year", values_from = c("poverty_rate"))

Put data into long format

long <- wide %>% pivot_longer(cols = c(`2000`, `2005`, `2010`, `2015`, `2020`), names_to = "year", values_to = "poverty_rate")

Summarize mean variable

collapse <- df %>% summarize(mean_poverty_rate = mean(poverty_rate)) print(collapse)

Summarize

collapsed_group <- df %>% group_by(country) %>% summarize(mean_poverty_rate = mean(poverty_rate)) %>% ungroup()

ifelse()

It presents a condition (e.g. 𝑥 = 1), and tells the computer to replace the value with one thing if the condition is true or something else if the condition is false.

ifelse() example

df$decade2020s = ifelse(df$year >=2020,1,0) print(df)

Class 2 Notes Flashcards

(36 cards)