Exam Flashcards
(93 cards)
Componets of the model and what they do?
The prediction model combines two parts:
1. structural fundamental forecast of national popular vote (NPV) (prior)
2. state-level polling forecast (model describing the distribution of data - Mathematical formulas that characterize the trends and spreads in the data which has different parameter values control knobs)
› model of differences between states (NPV –> SPV for each state) –> prior differences
› When we go from the prior (NPV) to the states vote share we use state relative positions (based on previous elections) to see the difference between each state so that all states don’t have TFC as baseline
› national and state-level polls
› models for sampling and non-sampling error in polls
› model for state and national opinion changes during campaign
–> This is our correlations matrix. Here we use sociodemographics/how similar the states are to share information from state-level polls
Three parameters that the model explores (‘samples’) using MCMC method
› fundamentals ( It’s standard deviation – to reflect uncertainty in the national popular vote (NPV) forecast.)
› potential temporal drift of the polls ( different values for the rate and direction of temporal changes in public opinion during the campaign period.)
› different types of polling bias ( polling errors, such as: Sampling Error: Variability due to the limited sample size of polls. Non-Sampling Error: Systematic biases introduced by polling methodologies (e.g., house effects)
› outcome -> exploration of the posterior distribution
› After many iterations, the model converges on a posterior distribution that reflects the most likely parameter values (even very unlikely ones are in but not as often as more likely scenarios) and provides probabilities for outcomes like the state vote shares and Electoral College results.
› After many iterations => converging towards the ‘actual’ distribution
The theoretical assumptions behind our bayesian model (Structural + polls)
The need for structural fundamentals: Economic voting
- polls can be problematic many months in advance - capture the moment
- Swing voters converge towards ‘true’ opinions closer to election (economic voting of swing voters) based upon the state of the country. structural fundamentals uses the state of the country as a predictor
- Deliver some information that aren’t in polls (Graefe 2018)
To include polling data and make it dynamic: The need for the Michigan Model
- Stupid not to use the available information we have
- fundamentals is a national and lack of data make it unreliable - Lauderdale & Linzer 2015. Grafe 2018 shows that it is the most unpredictable from 2004 to 2016.
- Make a state covariance, so we can share information from polls to similar states. Here we utilize the insights from the Michigan model and assume that similar states (based on similar sociodemographics) vote similar and move similar ways
How does the Bayesian approach handle uncertainty regarding polling?
We include models (that desribes the data) for sampling and non-sampling error in polls. During the simulation
it randomly jumps to different values of this (parameter values) and they are accepted in regards to how probable they are based on the prior and data –> so how well do they fit with the data with the prior in mind.
By running thousands of simulations, the model generates a posterior distribution that accounts for the influence of polling errors on the forecast.
The probability of a candidate’s victory is calculated as the fraction of simulations they win, reflecting both likely and improbable scenarios (e.g., winning the Electoral College but losing the popular vote). This approach ensures comprehensive modeling of election uncertainties.
We have also set a high prior in the beginning when the polls are more noisier because of voters intentions becomes clearer in the end
Newer polls are weighted more than early polls
What kind of election? (AP votecast and Exit polls)
We saw almost a nationwide shift towards trump –> could potentially be a realignment (not uniform) - otherwise balancing
- especially looking at education
He won in all groupings - for example he lost with less among urban citizens than in 2020
significant inroads in Latinos voters - Improved on younger voters too
However education may have happened in 2016 - Trump as an outlier?
What is wrong with the fundamentals model? Lauderdale & Linzer 2015
Three sources of uncertainty
1) Fundamentals models often fail to account for the full uncertainty in their predictions, such as the variability in coefficient estimates - the uncertainty of the point prediction. This leads to overly narrow confidence intervals - which actually should be around 7-10 points –> not very much information in close elections
2) There is no consensus on which variables (e.g., GDP growth, unemployment) are most predictive –> Different specifications can yield significantly different predictions, making results less reliable
- This is because of the limited evidence of elections
3) Electoral College Dynamics: Fundamentals models typically focus on the national popular vote, overlooking the complexities of the Electoral College system, which aggregates state-level outcomes. This can result in incorrect predictions, particularly in close elections. –> for example you can win the popular vote but lose the electoral college
Data limitations: The models rely on a limited number of past elections (often fewer than 20), making it difficult to draw strong conclusions about patterns and relationships. –> limited strength of the evidence.
There is a significant change in political context regarding 1948 to 2024
How much can the data and trends from 1948 be used now - different context. Argument for taking term out for example
Why is simulation used in Bayesian analysis?
To handle cases where analytical solutions are not possible, like when multiple parameters are involved.
What is the main points from the Bitter end?
Calcification –> is due to polarization
o Calcification means less willingness to defect from their party, such as by breaking with their party’s president or even voting for the opposite party.
o There is thus less chance for new and even dramatic events to change people’s choices at the ballot box.
o –> This means smaller fluctuations from year to year in election outcomes.
o However not the same winner because of so close in sheer numbers
o 2016-2020 calcified very small shifts
Not polarization as such –> polarization is a larger gap between the two parties on different areas. Affective polarization
Three elements in the Trump presidency, 2020 election and aftermath upshot was more calcified politicas
o (1) Long-term tectonic shifts have pushed the parties apart while making the views within each party more uniform –> gradually increasing partisan polarization.
o (2) Shorter-term shocks, catalyzed especially by Trump, have sped up polarization on identity issues
o Race, gender, ethnicity etc.
o (3) It is precisely these identity issues that voters in both parties care more about— exacerbating divisions even further and giving politicians every incentive to continue to play to them.
o They voted based on these divided issues
Party polarization explains changes in the candidates’ coalitions. One consequence should be a stronger association between people’s own ideological predispositions and their voting behavior. not so sociodemographic?
o This increased polarization, as Trump’s performance among conservatives and Biden’s performance among liberals made Americans’ ideological identification a stronger predictor of how they voted in 2020
o The ideological polarization in voting behavior was more likely to come from a third source: voters changed their issue positions in ways that aligned with their partisanship.
Partisan perceptions of the economy and its impact on structural fundamentals? (Brady et al. 2022)
Partisans may exhibit motivated reasoning, attributing a good economy to their preferred party and a bad economy to the opposition.
Consensus: Incumbent Partisans are comparatively more optimistic about the economy when their party controls the executive branch than they are otherwise
the gap in economic perceptions approximately doubled between 1999 and 2020, and that partisan economic perceptions no longer seem to converge during economic crises.
Attribution for good/bad economy influenced by partisanship –> Perceptual bias
- also who they give the credit to
Argument for TFC and the objective variables doesn’t play a role
Problems with survey-based opinion (Linzer 2013)
Not every state is polled on every day, leading to gaps in the time series. Data are especially sparse in less-competitive states and early in the campaign.
Second, measured preferences fluctuate greatly from poll to poll, due to sampling variability and other sources of error.
We can eradicate these problems by pooling the polls (Jackman 2005)
Maybe Erickson & Wlezien 2008/2014
Synopsis questions: Why don’t we just use polls or the TFC model?
Not use the TFC only –> but…assumes states move in same way from election to election… –> how about sociodemographic changes in the state and state-specific issues?
Polls –> Erickson & Wlezien
TFC –> uncertain data and becomes more accurate with polls - static (Linzer 2013)
What are the five types of elections? (The american voter revisited)
- Maintaining: 1964 (LBJ)…
- A maintaining election is one in which stable partisan attachments continue to be a major determinant of election re- sults. - Deviating: 1952 (Eisenhower)…
- A deviating election would then be one in which the short-term partisan attitudes lead to the election of the presidential nominee of the minority party, without a fundamental shift in the party identification bal- ance in the nation. - Re-instating: 1960 (JFK), 1980 (Carter)
- Re-alignment: 1930s (FDR), 1980s (Reagan - wwc, gender, white evangelicals)
- The realigning election is one in which the partisanship of people changes. Systematic change occurs when issues motivate social groups to move to one party and when change is reinforced through one’s social groups.
Definition: Significant shift in the sociodemographics group –> can lead to balancing or dominance - Balancing: 2016, 2020
- One in which neither party has a majority in party identification.
- The partisan balance is so close that either party could win.
- This is not an election in which short-term forces work against the majority party, but one in which it is not possible to speak of a majority party.
The real story of the last 50-plus years of American presidential elections is a weakening of the Democratic lead in party identification to the extent that elections are very close and can be swung by any number of short-term matters.
The Linzer State-level model? (Linzer 2013)
Combines fundamentals and state-level polls.
› Uses a sequence of state-level preelection polls to estimate both current voter preferences and forecasts of the election outcome, for every state on every day of the campaign, regardless of whether a survey was conducted on that day. –> share information between states (simple model)
› Forecasts from the model gradually transition from being based upon historical factors early in the campaign to survey data closer to the election Gives more weight to the data in the end as it has more information and seems more credible
› In states where polling is infrequent, the model borrows strength hierarchically across both states and time, to estimate smoothed within-state trends in opinion between consecutive surveys.
› Possible because The temporal patterns in state-level opinion are often similar across states. And you can look at the trend in a correlated state –> Similar states vote similkar –> sociodemographics argument
› Very simple –> we use a correlation matrix
› The model also filters away day-to-day variation in the polls due to sampling error and national campaign effects, which enables daily tracking of voter preferences toward the presidential candidates at the state and national levels.
SO: we get a proportion of voters in state i on day j who say vote D based upon the fundamental model and the polls. The polls are used to show the proportion of voters in state i on day j who say vote D – when there isn’t polls we use state-level effect – LR dynamics of voter preferences in state I (here the prior is included) – and national effects – borrow trends from other states (campaign effects). To capture the vote share on election day we incorporate the fundamentals model (normal prior distribution) – set a weight for the prior which determines how sensitive to polling data. We use a Bayesian reverse random walk from election day and back to the last day with polls which have become the prior distribution
What factors influence the Perry-Gallup likely voter index?
Seven questions that include voter’s past voting behavior, level of interest in the election, and self-reported likelihood of voting. - then assign points based on the answers
Deterministic likely voter model
Why do we pool the Polls? (Jackman 2005)
To create a more precise estimate of the vote intentions. Individual polls are subject to a lot of noise regarding house effects and bias especially.The chief benefit of pooling poll results (after correcting for house effects) is that we are much better positioned to ascertain movements in levels of voter support in response to campaign events
Also the sampling error and therefor uncertainty regarding the point estimate lowers when pooling the polls.
–> the sample sizes are simply too small to reliably detect small fluctuations in support for the parties over the course of an election cam- paign.
Combine the information in the published polls, leveraging them against one another, so as to obtain a clearer picture of what might be going on in the electorate over the campaign
Individual polls are snapshots in time (rolling average – each poll not independent)
Dependent on the poll from the day before
Different ways of pooling:
* simple averages
§ Benefit of sample size
§ The challenge of the average will be very static –> one poll doesn’t change much
§ Because the information means the same
* Bayesian pooling -> prior + new information (new polls)
§ Much more fastmoving
§
§ Gives weight to the polls to capture a better estimate
§ Time in the bayesian model means a lot
* Detect movement in voter support due to campaign events
USES Bayesian pooling -> prior + new information (new polls)
- Gives weight to the polls to capture a better estimate
- Detect movement in voter support due to campaign events
o Prediction
- A recursive process
- Only the estimated state from the previous time step and the current measurement are needed to compute the estimate for the current state
- Continuously try to predict the next point in the time series:
o Prediction is made
o Once the result is measured (i.e., new polling data comes in, or a new missile location is reported), we calculate how far off prediction was
o Update prediction of new latent state, make new prediction for next observation
- Updates it self based on the data
We can’t be sure of the house effects - can account them for last time however the firms will work to account for this –> Linzer does this. Otherwise we can throw it in to our bayesian model with a prior
However lack polling data in some states
Bad things with only using Fundamentals? (Linzer 2013)
They are subject to a large amount of uncertainty, as Most historical forecasts are based on data from just 10 to 15 past elections, and many only generate national- level estimates of candidates’ voteshares.
Moreover, in the event that an early forecast is in error, structural models contain no mechanism for updating predictions once new information becomes available closer to Election Day.
Preelection polls provide contextual information that can be used to correct potential errors in historical forecasts, increasing both their accuracy and their precision.
Polls conducted just before an election generate estimates that are very close to the eventual result, on average
What is the Time for Change Dynamic
It is based on the hypothesis that
voters attach a positive value to periodic alter-
nation in power by the two major parties and
that regardless of the state of the economy and
the popularity of the current president, when a
party has held the White House for two or
more terms, voters will be more likely to feel
that it is time to give the opposing party an
opportunity to govern than when a party has
held the White House for only one term.
What are the two main types of polling, and how do they differ in methodology? (Bailey 2024)
- Probability-based Polling (Random Sampling):
Uses a sampling frame to identify a pool of respondents representative of the population. –> for example land lines or adress based sampling
Relies on interview modes like live calling or interactive voice response (IVR) to collect data. –> can still be biased based on non-response bias - Non-probability-based Polling:
Does not rely on random sampling. Instead, respondents are recruited from convience, pre-existing panels or quota (mathing) sampling methods.
Often used in internet polling, which may introduce bias due to the lack of randomness in the sample selection.
Nonprobability samples face “one fundamental problem: There is no comprehensive sampling frame for the internet, no way to draw a national sample for which virtually everyone has a chance of being selected”. In the random sampling paradigm, the contact list is a random sample which will include people willing and unwilling to respond. For internet panels, the people contacted have already said they will respond to polls.
PROBABILITY IS THE BEST
But internet isn’t doing that bad right now (Silver 2021)
Why Forecast? - benefits of forecasting (Gelman et al. 2020)
- First, the popularity of forecasts reflects revealed demand for such information.
- Second, by collecting and organizing relevant information, a forecast can help people and organizations make better decisions about their political and economic resources.
-If people are going to follow election night results online — and they do, by the millions — they ought to have the context to understand them (Cohn & Katz 2018) - Third, the process of building — and evaluating — forecasts can allow scholars and political observers to better understand voters and their electoral preferences, which can help us understand and interpret the results of elections.
Derek slides:
1. There is demand
2. Can help to organize political and economic resources
4. Can help advance understanding of voters and their electoral preferences
What does Erickson & Wlezien (2008 and 2014) show about economic performance as a predictor?
Why does we use structural fundamentals?
They show the economic voting of swing-voters –> The economic perceptions are important for the vote
Voters converge towards ‘true’ opinions closer to election (economic voting of swing voters)
- here they use economic indicators to guide their vote when they couldn’t care less in April. This is because the campaign makes it salient
In April (200 days before the election), perceived business conditions only moderately predict polling results.
By November (Election Day), there is a much stronger correlation between perceived economic conditions and the vote share for the incumbent party, showing how economic perceptions shape the final vote.
What is a Probabilistic Likely Voter model vs. a deterministic one? And what is Rentch et al. 2020’s insight?
A probabilistic model predicts each respondent’s probability of voting rather than assigning a strict cutoff. This approach is more flexible and accounts for uncertainty, often using logistic regression.
Deterministic has a cutoff
Probabilistic models, offer clear benefits. - Each respondent is assigned an estimated probability that they will vote. This probability is then used as a weight: Responses from those who are more likely to vote are weighted more heavily than responses from those who are unlikely to vote, but all are included in the election prediction.
Probabilistic approaches serve as a compromise between registered-voter and likely-voter methods; the preferences of all respondents are utilized only to an extent proportional to their assessed probability of actually voting.
Rentch et al. 2019 –> Propose a probabilistic likely-voter model that not only uses the typical items (such as those from the Perry-Gallup index), but also adds demographic information about respondents such as age, education, race, income, gender, and strength of partisanship. These demographics are correlated with turnout
Why use a Sociodemographic probabalistic voter model?
- a likely-voter model that is probabilistic uses information from all respondents in the sample rather than discarding those that fail to meet a particular threshold.
- And a likely-voter model that makes use of demographic information for its predictions takes advantage of data that most pollsters collect anyway and which happen to be good predictors of turnout and overreporting.
Key steps/core idea of Bayesian analysis
Key steps:
Identify data – For us this is polling
Define a descriptive model for the data - model for polling bias and state and national.
- This includes choosing the appropriate distribution (e.g., normal distribution) and defining its parameters (e.g., mean and standard deviation).
Assign a prior probability distribution to the parameters of your model. This represents your initial beliefs or expectations before seeing the data.
–> for example Our Fundamentals model
Use Bayesian inference to re-allocate credibility across parameter values.
o Combine your prior beliefs with the observed data to calculate the posterior distribution (updated belief)
o It is essensially a compromise
When new data arrives (like a poll), you evaluate how consistent the data is with your prior belief
Based on how strong your prior is the posterior will be a compromise
Downside to forecasting? (Victor 2021)
- Partisan Polarization Perverts the Fundamentals –> Partisan polarization introduces systematic bias into models since partisans view fundamentals (like economic status or presidential approval) differently depending on their political affiliation. This can reduce the reliability of forecasts.
- Forecast may affect turnout - greater emphasis on forecasts in 2016 cycle increased voters’ certainty about the election outcome and therefore depressed turnout
- false sense of certainty –> overstate certainty or fail to report
- misguides people
Make better visualisations –> for example the needle (The article in ny times)
What is probability-based polling vs. Non-probability based polling? (Bailey 2024)
Probability based sampling – previously the golden standard –> Random drawn
- Sample frame –> How to identify the pool of respondents that are representative. Before: random digit dialing. Some probability-based pollsters now use address-based sampling (ABS) or registration-based sampling (RBS)Can still be biased if only old people have landlines
- Interview mode: Live-calling respondents, Interactive voice response, web based address sampling and FTF (Anes e.g.)
Both non contact and nonresponse
Non-probability-based sampling – Opt-in –> unrepresentative NON-random
- Most nonprobabilistic polls are conducted via the internet or text messages –> Based on opt-in behaviour
- Convience, internet panels and Quota sampling (matching) –> risk of professionals
- No response rate - Unrepresentative
Regarding representation and nonresponse–> they use weights. Both have problem with nonignorable nonresponse
- weighting does not solve – and could potentially exacerbate – nonignorable nonresponse bias.
What is important by the 538’s model?
Correlation matrix, fundamentals and what do they sample using the MCMC
Polling data :
- Allow movement to be correlated between states in addition to between a state and the nation as a whole. lets polls in one state influence similar states – so the trend and Movement in one state in one poll is shared to other states, where the influence is based on similarity
Three factors of similarity: The first is how similarly they have voted in presidential elections since 1948 (we only do 1998-2020), geographic similarity (10 political regions) and demographic similarity between states
The fundamentals model:
- Economic and political fundamentals
11 economic variables and primarily the TFC model with where the model decreases the effect of presidential approval and economic growth in more polarized elections. State factors are also in regarding vote and home-state of candidates.
Three parameters that the model explores (‘samples’) using MCMC method
› fundamentals ( It’s standard deviation – to reflect uncertainty in the national popular vote (NPV) forecast.)
› potential temporal drift of the polls ( different values for the rate and direction of temporal changes in public opinion during the campaign period how much polling averages change)
› different types of polling bias ( polling errors, such as: Sampling Error: Variability due to the limited sample size of polls. Non-Sampling Error: Systematic biases introduced by polling methodologies (e.g., house effects)
› outcome -> exploration of the posterior distribution
› After many iterations, the model converges on a posterior distribution that reflects the most likely parameter values (even very unlikely ones are in but not as often as more likely scenarios) and provides probabilities for outcomes like the state vote shares and Electoral College results.