Introduction. Visualizing Data Flashcards by Давид Авагян

What is spurious correlation?

A spurious correlation occurs when two variables are correlated but don’t have a causal relationship

How well did you know this?

Not at all

Perfectly

Omitted variable bias

It occurs when we do not include an independent variable in the model which has a causal effect on dependent variable

How well did you know this?

Not at all

Perfectly

Simpson’s Paradox

It is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined

How well did you know this?

Not at all

Perfectly

Unit of analysis

The observation described by a set of data. For example, voters,
parties, bills, elections, voting decisions, legislative output. Very often our data have multiple levels of analysis (e.g., individuals, regions, countries), calling for different statistical techniques

How well did you know this?

Not at all

Perfectly

Variables

Any characteristic related to the unit of analysis. A variable can take on different values for different observations

How well did you know this?

Not at all

Perfectly

Types of variables

e.g., nominal (e.g., political party), ordinal (e.g., school grades),
interval (e.g., GDP), ratio (e.g., duration)

How well did you know this?

Not at all

Perfectly

Data set

Set of variables for a given set of observations. Should come with a codebook

How well did you know this?

Not at all

Perfectly

Hypothesis

Statement about the nature of the social and political world, often
expressed as statements about relationships between variables (e.g., “The lower X,the higher Y”)

How well did you know this?

Not at all

Perfectly

Cross-section data

Sample of voters, governments, countries, or other units, taken at a given point in time. Observations are typically assumed to be independent

How well did you know this?

Not at all

Perfectly

Time series data

Observations on units over time, e.g., number of conflicts in country X. Because past events can influence future events and lags in behavior are prevalent in social sciences, time is an
important dimension in such a data set. Observations are not independent across time (serial correlation)

How well did you know this?

Not at all

Perfectly

Pooled time series cross-section data

Data consist of comparable time series data observed on
a variety of units. For instance, units are countries, and for each country we observe annual data on a variety of political and economic variables. Typically, we have few units, but long time series. Pooling the data increases the number of observations and makes it possible to control for exogenous shocks.
Observations are usually not independent.

How well did you know this?

Not at all

Perfectly

Panel data

A large number of the same cross-sectional units, e.g., survey respondents, are observed
repeatedly over a number of “waves” (interviews). With panel data, the time series is usually very short.
Common in studies of political behavior. For example, German Socio-Economic Panel (SOEP) or the GIP
(German Internet Panel) in Mannheim

How well did you know this?

Not at all

Perfectly

A histogram

It shows the distribution of the measurements of a variable, bar graph in which the height of the bar shows how many observations fall in particular subintervals (bins), plotted along the horizontal axis

How well did you know this?

Not at all

Perfectly

Density plot

Address the deficiencies
of histograms by averaging and smoothing, probability density function from the random variable X

How well did you know this?

Not at all

Perfectly

Measures of Central Tendency

Mode, Median, Mean

How well did you know this?

Not at all