Trivia Flashcards
(31 cards)
What is a field experiment?
Field experiments randomly assign a treatment to a sample in the natural environment of the population of interest, often with real-world treatments.
In the context of an experiment, what is spillover?
Interference (also known as spillover) refers to when the treatment assignment of one person affects at least another person (think: a vaccine against a communicable disease)
What is the fundamental problem of causal inference, and why does randomization help to overcome it?
Fundamental problem of causal inference: You never observe the same object/person in both the treatment (e.g. those who get the pill) and control (e.g. those who get the placebo) states
Why randomization: With a large enough number of observations,
randomizing treatment/control allows you to come close to overcoming the fundamental problem of causal inference. In short, when you equalize who gets the treatment with larger samples, you can be sure that your estimate is correct.
You are working in R. You want to create a new variable called var2
in the dataset data. Your var2 would take the value of 1 if another
variable, let’s call it var1, is greater than 10; otherwise, you would
assign a value of zero to var2. What would be your R code?
data$var2 = ifelse(data$var1 > 10, 1, 0)
You are working in R with a data frame called df. In this data
frame, you have a variable called gdp per capita, and it has some
missing values. How would you get the mean of this variable?
mean(df$gdp per capita, na.rm = TRUE)
You are working in R. Your data frame is called df, which has data
for the France over the years 2000-2010. In this data frame, you have variables called year and poverty rate. How would you create a labelled scatter plot of these two variable using the ggplot2
package?
library(ggplot2)
make scatterplot of poverty rate by year
scatter= ggplot(df, aes(x=year, y=poverty_rate)) + geom_point() +
labs(x= “Year”,
y= “Poverty Rate”,
title= “The Poverty Rate in France”,
caption= “Source: World Bank”)
print(scatter)
You are working in R. How would you import a .csv file called “this class rocks.csv” and assign it to an object called “data”?
library(rio)
data = import(‘‘this class rocks.csv’’)
What is a null hypothesis?
A null hypothesis is a
theory-based statement about what we would observe if there were no
relationship between the dependent and independent variable
You are working in R. Your dataframe is called “data”. After you
the examine the individual countries (variable is called “country”) in your dataset, you notice that France is spelled “Frannnce”. How would you correct the spelling?
data$country[data$country==”Frannnce”] <- “France”
Let’s say that you have estimated a logit model. How do you interpret the default output of the coefficients in R?
The default output of a logit model is the log odds, which are essentially
uninterpretable—except for their sign.
What does it mean for a study to be replicable?
A replicable study is one with public procedures that analysts can replicate after receiving access to an author’s data and statistical software code.
You are working in R. You have a dataset called “data”. The dataset has countries (variable is called “country”) in various years.
What code would you use if you wanted to create a new dataset
with just the country of France?
newdata = data %>% filter(country==‘‘France’’)
Lionel Messi scored 100 goals in 2010 and 90 goals in 2011.
Cristiano Ronaldo scored 70 goals in 2010 and 80 goals in 2011.
Write tables for this dataset in both the long and wide formats.
(Bonus: What can we infer from these goal numbers?)
LONG
Player Year Goals
Messi 2010 100
Messi 2011 90
Ronaldo 2010 70
Ronaldo 2011 80
WIDE
Player 2010 2011
Messi 100 90
Ronaldo 70 80
You are working in R with a data frame called “df”. How would you
rename a variable called “weird name” to “better name”.
library(tidyverse)
df = df %>% rename(better_name = weird_name)
You are working in R with a data frame called “data”. One of the
variables in this dataset, call it “country”, is a string variable
indicating a country. How would you check which countries are in
this data frame?
table(data$country)
In the context of hypothesis testing, what is a Type I error?
A Type I error is when you reject the null hypothesis when it is true. The
other name for a Type I error is a false positive.
You are working in R. What code would you use to check whether a
numeric variable, year, in your data frame, df, is actually numeric?
class(df$year)
Provide one variable with a nominal scale and one variable with an
ordinal scale.
Nominal: colors —red, green, blue
Ordinal: rating —bad, OK, excellent
Name a statistic that allows you to test if the difference in means
between two groups is statistically significant.
t-ratio statistic, z-score, p-value, etc.
What are statistical significance and substantive significance?
Statistical significance refers to when you reject the null hypothesis,
usually when the p-value is less than 0.05.
Substantiative significance refers to the size of the coefficients when you
estimate your regression model. If those coefficient(s) are not large,
thinking about their distributions (think: summary stats), than we would say that the coefficient(s) are not substantively significant.
What is internal validity?
It refers to the extent of truth in a causal relationship
What is external validity?
It refers to the extent to which the findings can be generalized to other
subjects, places, contexts, places, times, etc. In other words, it refers to
the extent to which sample inferences travel to the broader population or another target population.
What is complete random sampling?
random sampling (think: a coin flip of a fair coin) in which the size of the
groups are equal or very close it (if there is an odd number of trials)
What is a convenience sample?
A non-probability sample that is a not a random and selected on the basis
of logistics – i.e., often through “language facilities, connections, and
previous acquaintance with a region, time-period, or topic.”