Exam 1 Review Flashcards
(10 cards)
- What distinguishes social science from casual conversation? Provide at
least 3 reasons/distinguishing features.
G&C: Use of reason and evidence to problems with attention
to method and possible error; uncertainty estimates; scope conditions; disinterested and
objective with regard to the truth.
K,K, & V: goal is inference; procedures are public; conclusions
are uncertain; content is the method.
- What is the difference between an observation, variable, case, sample,
and population? (Hint: it may help to draw the diagram that we discussed
in class, which also appeared in the slides and lecture notes)
- Observation: a unique unit of analysis, which is the smallest relevant unit of analysis relevant for a quantitative study
- Variable: a dimension that houses like observations
- Case: a unique unit of analysis that refers mostly to qualitative research and groups of observations in quantitative research
- Sample: observations in a study (sample size: π)
- Population: the universe of phenomena that the hypothesis seeks to describe, which is mostly theoretical and unobserved
- Unit of analysis: country, country-year, municipality, etc.
- Does the dataset below contain cross-sectional, time-series, or panel data? If it is a panel dataset, is it in long or wide form? How do you know?
Country Year Civil War
Colombia 2011 1
Colombia 2012 1
USA 2011 0
USA 2012 0
It contains panel data because it has different units (countries) at different time points (years). The data are in long format because the year has its own column.
- Take the dataset in the previous question and re-write it in its opposite form. In other words, if it is in wide form, transform it long. Alternatively, if the dataset is in long form, transform it wide. Also, please specify whether this new dataset that you have specified is in wide or long form.
To covert the data to wide:
Country Civil War 2011 Civil War 2012
Colombia 1 1
USA 0 0
- What is the main difference between an Excel file and a CSV file? (Hint: think about the difference that could get you into trouble with your future employer if you didnβt know the difference between the two file types.)
An Excel file, usually with .xls or .xlsx file suffixes, can have multiple tabs. However, a CSV file can only have one tab.
- In R, how would you obtain the mean of a variable called population in a data frame called df? Assume that the variable has missing values.
You have two options: 1) you could just run summary(df); or 2) you could run mean(df$population, na.rm = TRUE).
- What is variance, and what is standard deviation? How do they relate to each other? Provide the verbal explanations and formulas.
Variance refers to the spread of the data, which captures the squared mean distance. Standard deviation is the square root of the variance. Thus, we cannot understand the standard deviation without the variance. Conceptually, the standard deviation (roughly) captures the average distance of the data from the mean.
pop var: β(π₯π β π₯)ΜΒ²/π
sample var: β(π₯π β π₯)ΜΒ²/π-1
pop sd: β β(π₯π β π₯)ΜΒ²/π
sample sd: β β(π₯π β π₯)ΜΒ²/π-1
- What are the basic elements of any ggplot in R? Provide the code and explain in words.
load libraries
1) load ggplot2 or tidyverse library
2) tell R the data frame that you would like to use (e.g., one called df),
3) 3) what goes on the π₯-axis, what goes on the π¦-axis
4) which geom_() object you are going
to use, which tells R what type of graph you would like to make
library(tidyverse)
library(ggplot2)
ggplot(data, aes(x,y)) +
geom_point() +
labs(x = βX axis nameβ,
y = βY axis nameβ,
title = βtitle hereβ)
- What is a statistical estimate? What does it comprise? Make sure to
provide the relevant formula and explain all of the terms.
ππ π‘ππππ‘π = ππ π‘πππππ + ππππ + ππππ π
- estimate: the output from our estimation
- estimand: true quantity of interest
- bias: systematic error
- noise: random error
- What is external validity? Explain your answer and provide an example of a test without external validity
External validity captures the extent to which inferences from a sample generalize to a larger population or transport to another target population. We can usually measure
external validity through its different dimensions in terms of samples and populations:
Mechanisms, Settings, Treatments, Outcomes, Units, and Time (STOUT).