Trivia Flashcards

(31 cards)

1
Q

What is a field experiment?

A

Field experiments randomly assign a treatment to a sample in the natural environment of the population of interest, often with real-world treatments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In the context of an experiment, what is spillover?

A

Interference (also known as spillover) refers to when the treatment assignment of one person affects at least another person (think: a vaccine against a communicable disease)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the fundamental problem of causal inference, and why does randomization help to overcome it?

A

Fundamental problem of causal inference: You never observe the same object/person in both the treatment (e.g. those who get the pill) and control (e.g. those who get the placebo) states

Why randomization: With a large enough number of observations,
randomizing treatment/control allows you to come close to overcoming the fundamental problem of causal inference. In short, when you equalize who gets the treatment with larger samples, you can be sure that your estimate is correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

You are working in R. You want to create a new variable called var2
in the dataset data. Your var2 would take the value of 1 if another
variable, let’s call it var1, is greater than 10; otherwise, you would
assign a value of zero to var2. What would be your R code?

A

data$var2 = ifelse(data$var1 > 10, 1, 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

You are working in R with a data frame called df. In this data
frame, you have a variable called gdp per capita, and it has some
missing values. How would you get the mean of this variable?

A

mean(df$gdp per capita, na.rm = TRUE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

You are working in R. Your data frame is called df, which has data
for the France over the years 2000-2010. In this data frame, you have variables called year and poverty rate. How would you create a labelled scatter plot of these two variable using the ggplot2
package?

A

library(ggplot2)

make scatterplot of poverty rate by year
scatter= ggplot(df, aes(x=year, y=poverty_rate)) + geom_point() +
labs(x= “Year”,
y= “Poverty Rate”,
title= “The Poverty Rate in France”,
caption= “Source: World Bank”)

print(scatter)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

You are working in R. How would you import a .csv file called “this class rocks.csv” and assign it to an object called “data”?

A

library(rio)
data = import(‘‘this class rocks.csv’’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a null hypothesis?

A

A null hypothesis is a
theory-based statement about what we would observe if there were no
relationship between the dependent and independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

You are working in R. Your dataframe is called “data”. After you
the examine the individual countries (variable is called “country”) in your dataset, you notice that France is spelled “Frannnce”. How would you correct the spelling?

A

data$country[data$country==”Frannnce”] <- “France”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Let’s say that you have estimated a logit model. How do you interpret the default output of the coefficients in R?

A

The default output of a logit model is the log odds, which are essentially
uninterpretable—except for their sign.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does it mean for a study to be replicable?

A

A replicable study is one with public procedures that analysts can replicate after receiving access to an author’s data and statistical software code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

You are working in R. You have a dataset called “data”. The dataset has countries (variable is called “country”) in various years.
What code would you use if you wanted to create a new dataset
with just the country of France?

A

newdata = data %>% filter(country==‘‘France’’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Lionel Messi scored 100 goals in 2010 and 90 goals in 2011.
Cristiano Ronaldo scored 70 goals in 2010 and 80 goals in 2011.
Write tables for this dataset in both the long and wide formats.
(Bonus: What can we infer from these goal numbers?)

A

LONG
Player Year Goals
Messi 2010 100
Messi 2011 90
Ronaldo 2010 70
Ronaldo 2011 80

WIDE
Player 2010 2011
Messi 100 90
Ronaldo 70 80

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

You are working in R with a data frame called “df”. How would you
rename a variable called “weird name” to “better name”.

A

library(tidyverse)
df = df %>% rename(better_name = weird_name)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

You are working in R with a data frame called “data”. One of the
variables in this dataset, call it “country”, is a string variable
indicating a country. How would you check which countries are in
this data frame?

A

table(data$country)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In the context of hypothesis testing, what is a Type I error?

A

A Type I error is when you reject the null hypothesis when it is true. The
other name for a Type I error is a false positive.

17
Q

You are working in R. What code would you use to check whether a
numeric variable, year, in your data frame, df, is actually numeric?

A

class(df$year)

18
Q

Provide one variable with a nominal scale and one variable with an
ordinal scale.

A

Nominal: colors —red, green, blue
Ordinal: rating —bad, OK, excellent

19
Q

Name a statistic that allows you to test if the difference in means
between two groups is statistically significant.

A

t-ratio statistic, z-score, p-value, etc.

20
Q

What are statistical significance and substantive significance?

A

Statistical significance refers to when you reject the null hypothesis,
usually when the p-value is less than 0.05.

Substantiative significance refers to the size of the coefficients when you
estimate your regression model. If those coefficient(s) are not large,
thinking about their distributions (think: summary stats), than we would say that the coefficient(s) are not substantively significant.

21
Q

What is internal validity?

A

It refers to the extent of truth in a causal relationship

22
Q

What is external validity?

A

It refers to the extent to which the findings can be generalized to other
subjects, places, contexts, places, times, etc. In other words, it refers to
the extent to which sample inferences travel to the broader population or another target population.

23
Q

What is complete random sampling?

A

random sampling (think: a coin flip of a fair coin) in which the size of the
groups are equal or very close it (if there is an odd number of trials)

24
Q

What is a convenience sample?

A

A non-probability sample that is a not a random and selected on the basis
of logistics – i.e., often through “language facilities, connections, and
previous acquaintance with a region, time-period, or topic.”

25
What is a regression discontinuity design?
It is a design in which there is an as-if random threshold/cutoff that separates observations into treatment and control based on a continuous variable. Think: a scholarship program in which there is an SAT score threshold separating those can who can receive the scholarship.
26
In a worker training program trying the discern the impact of the program on earnings, what is the treatment?
The treatment is receiving the worker training program.
27
In the context of a study examining the impact of oil revenues on democratization, a case study examining the impact of oil revenues on highly-democratic Norway would be an example of what type of case?
It would be a deviant case, because normally natural resource revenues do not contribute to democratization.
28
Draw a causal diagram of an instrumental variables model in which the exclusion restriction is violated, and explain why the exclusion restriction is violated.
Control Variables Z ↓ ↘ Q → X → Y ⮑ all way to↖ Rationale: 1) Anytime when Q is directly related to Y would pose that problem. When the exclusion restriction is in tact, Q is related to Y only through X. 2) In this case, hunger (Q) can directly affect someone’s happiness (Y ), so the exclusion restriction is violated.
29
Name one reason why you may want to use logistic regression instead of linear regression if your dependent variable is binary.
Linear regression allows probabilities to be less than zero and greater than 1, whereas logistic regression prevents such a breaking of the law of probability.
30
Provide an example of a time-series dataset.
Table: Time-Series Data country year population USA 2019 328.3 mil USA 2020 329.5 mil.
31
A difference-in-differences design necessarily involves what type of data?
panel data, a.k.a. time-series cross-sectional data