Exam 1 Review Flashcards

(10 cards)

1
Q
  1. What distinguishes social science from casual conversation? Provide at
    least 3 reasons/distinguishing features.
A

G&C: Use of reason and evidence to problems with attention
to method and possible error; uncertainty estimates; scope conditions; disinterested and
objective with regard to the truth.

K,K, & V: goal is inference; procedures are public; conclusions
are uncertain; content is the method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. What is the difference between an observation, variable, case, sample,
    and population? (Hint: it may help to draw the diagram that we discussed
    in class, which also appeared in the slides and lecture notes)
A
  • Observation: a unique unit of analysis, which is the smallest relevant unit of analysis relevant for a quantitative study
  • Variable: a dimension that houses like observations
  • Case: a unique unit of analysis that refers mostly to qualitative research and groups of observations in quantitative research
  • Sample: observations in a study (sample size: 𝑁)
  • Population: the universe of phenomena that the hypothesis seeks to describe, which is mostly theoretical and unobserved
  • Unit of analysis: country, country-year, municipality, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. Does the dataset below contain cross-sectional, time-series, or panel data? If it is a panel dataset, is it in long or wide form? How do you know?
A

Country Year Civil War
Colombia 2011 1
Colombia 2012 1
USA 2011 0
USA 2012 0

It contains panel data because it has different units (countries) at different time points (years). The data are in long format because the year has its own column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. Take the dataset in the previous question and re-write it in its opposite form. In other words, if it is in wide form, transform it long. Alternatively, if the dataset is in long form, transform it wide. Also, please specify whether this new dataset that you have specified is in wide or long form.
A

To covert the data to wide:

Country Civil War 2011 Civil War 2012
Colombia 1 1
USA 0 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. What is the main difference between an Excel file and a CSV file? (Hint: think about the difference that could get you into trouble with your future employer if you didn’t know the difference between the two file types.)
A

An Excel file, usually with .xls or .xlsx file suffixes, can have multiple tabs. However, a CSV file can only have one tab.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. In R, how would you obtain the mean of a variable called population in a data frame called df? Assume that the variable has missing values.
A

You have two options: 1) you could just run summary(df); or 2) you could run mean(df$population, na.rm = TRUE).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. What is variance, and what is standard deviation? How do they relate to each other? Provide the verbal explanations and formulas.
A

Variance refers to the spread of the data, which captures the squared mean distance. Standard deviation is the square root of the variance. Thus, we cannot understand the standard deviation without the variance. Conceptually, the standard deviation (roughly) captures the average distance of the data from the mean.
pop var: βˆ‘(π‘₯𝑖 βˆ’ π‘₯)Μ„Β²/𝑁
sample var: βˆ‘(π‘₯𝑖 βˆ’ π‘₯)Μ„Β²/𝑁-1
pop sd: √ βˆ‘(π‘₯𝑖 βˆ’ π‘₯)Μ„Β²/𝑁
sample sd: √ βˆ‘(π‘₯𝑖 βˆ’ π‘₯)Μ„Β²/𝑁-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. What are the basic elements of any ggplot in R? Provide the code and explain in words.
A

load libraries

1) load ggplot2 or tidyverse library
2) tell R the data frame that you would like to use (e.g., one called df),
3) 3) what goes on the π‘₯-axis, what goes on the 𝑦-axis
4) which geom_() object you are going
to use, which tells R what type of graph you would like to make

library(tidyverse)
library(ggplot2)

ggplot(data, aes(x,y)) +
geom_point() +
labs(x = β€œX axis name”,
y = β€œY axis name”,
title = β€œtitle here”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. What is a statistical estimate? What does it comprise? Make sure to
    provide the relevant formula and explain all of the terms.
A

π‘’π‘ π‘‘π‘–π‘šπ‘Žπ‘‘π‘’ = π‘’π‘ π‘‘π‘–π‘šπ‘Žπ‘›π‘‘ + π‘π‘–π‘Žπ‘  + π‘›π‘œπ‘–π‘ π‘’

  • estimate: the output from our estimation
  • estimand: true quantity of interest
  • bias: systematic error
  • noise: random error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  1. What is external validity? Explain your answer and provide an example of a test without external validity
A

External validity captures the extent to which inferences from a sample generalize to a larger population or transport to another target population. We can usually measure
external validity through its different dimensions in terms of samples and populations:
Mechanisms, Settings, Treatments, Outcomes, Units, and Time (STOUT).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly