week 7 - Parameter Estimation Flashcards

(64 cards)

1
Q

Name three popular model fitting frameworks

A

Least-squared estimation (LSE)
Maximum-likelihood estimation (MLE)
Bayesian estimation (BE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does the LSE fit a model?

A

by minimising the (squared) discrepancies between predictions and observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does the MLE fit a model?

A

finds the parameter values that give the highest likelihood of the observed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does Bayesian estimation fit a model?

A

by combining prior knowledge with the observed data to derive a range of likely parameter values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a simple way in which to fit a model via LSE?

A

by fitting a linear regression (straight line fit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

For LSE, what is the objective of parameter estimation? How is this achieved?

A

find the parameter values that minimise the discrepancy function via optimisation algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does RMSD stand for?
What is it?

A

-Root Mean Square Deviation
-measure in stats which calculated the discrepancy between predicted and observed values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If you have a high RMSD what does this tell you about model fit?

A

model fit is not good and discrepancy between predicted and observed values is large

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some popular optimisation algorithms?

A

Nelder-Mead Simplex
Simulated annealing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the error surface?
What can you find on the error surface?Why is that good?

A

-all the possible RMSD/error values that can be calculated
-place where the RMSD/error is at a minimum so you have found your model fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the ‘drunken triangle’?
Which optimisation algorithm is this?

A

-triangles moves in a way so that its tumbling down error surface to find (optimal) minimum error
-Nelder-Mead Simplex

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the four strategies for deciding where to move the drunken triangle/simplex?
What are they in order of instruction?

A

reflection, expansion, contraction, shrinking

  1. reflection
  2. reflection success -> expansion
  3. reflection fail -> contraction
  4. contraction fail -> shrinking
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does ‘reflection’ involve in parameter estimation using the Nelder-Mead Simplex?

A

removing the point with largest discrepancy (from error minimum) and flip it to opposite side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does expansion move the simplex/drunk triangle? What step must it come after?

A

After successful REFLECTION, extend the flipped point out to take a larger step down (thus closer to error minimum)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What strategy must you apply if reflection fails in the drunken triangle/simplex?

A

reflection fails -> contraction: move the worst fitting point more toward the centre (error minimum)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What must you do if contraction fails in the drunken triangle/simplex?

A

contraction fail -> shrinking: shrink reduce the triangle/simplex by half in the direction of the error minimum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

In the Nelder-Mead Simplex Algorithm, what are the starting values calculated from?

A

plausible values are collected from data, experiments you did or from the simulations done AND discrepancy calculated from them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the two steps in the Nelder-Mead Simplex?

A
  1. compute discrepancy for starting values
  2. tumble down the error surface until you reach error minimum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

For LSE, what is the issue with finding the minimum error?
How do you try to fix this?

A

-you can sometimes end up at a local minimum and think it is the global minimum
-Bootstrapping: by repeating process of LSE many times with different starting values. Then you look at the variability between the model parameter estimates (how much error there is for each set of starting values)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is bootstrapping?
What is using the variability calculated in bootstrapping like?

A

-provides an indicator of variability around the model parameter estimates by repeatedly sampling from the model or the data.
-like using confidence intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the drawbacks of using LSE to fit model (which even bootstrapping can’t relieve)?

A

-RMSD doesn’t tell you about the goodness of fit -> only about if the discrepancy is small or not
-can’t statistically compare to other models -> can’t tell whether difference is meaningful or due to chance
-parameter estimates don’t have any inherent statistical properties -> dont come with confidence intervals unless you do boot strapping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does it mean when a drawback of LSE includes ‘not being able to statistically compare models’?

A

can’t compare two models which have different fits (different parameters and parameter values) and see which is better -> can’t tell whether difference is meaningful or due to chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What mathematical function does LSE use to calculate the discrepancy?

A

RMSD
root mean SQUARE deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How is the logic in MLE kind of the opposite of LSE?

A

LSE the discrepancy between predicted and observed data values whereas MLE finds parameter values that give highest likelihood of the observed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Is likelihood the same as probability? why?
no its not because likelihood is possibility of model given data (probability is vice versa)
26
What is probability and what is likelihood?
likelihood: possibility of the model given the data probability: possibility of the data given the model
27
What is the definition of probability? In probability, what is samples?
-numerical value between 0-1 reflecting our expectation of an event -how many times you roll dice etc
28
If events are mutually exclusive (me), what is the probability of either of the me events occurring equal is to? P(a or b) = ?
equal to the sum of their individual probabilities: P(a or b) = P(a) + P(b)
29
What is joint probability? True or false: joint probabilities can exceed the individual probabilities
probability that both a and b occur at same time -false - their joint probability cannot exceed any of the individual probabilities
30
What is conditional probability in terms of events a and b? What if events a and b are independent?
probability that a occurs given the occurrence of b probability of a/b is not dependent on outcome of a/b
31
If events a and b are independent/dependent, what is their joint probability? P(a, b) =?
independent: = P(a) x P(b) dependent: =P(a|b) x P(b)
32
What are probability FUNCTIONS? What are the two different types of events in a probability function?
-measure the probability of all possible events predicted by a model -discrete or continuous events
33
What is the derivative of the CDF? How do you achieve this?
PDF (are under curve) is derivative of CDF by integration
34
What does CDF stand for? What is the usual shape of a CDF graph? What are the the axes of the CDF when trying to show exams scores of a group of students?
-Cumulative distribution function -sigmoidal -x=exam score y=cumulative probability
35
What is the maximum score the y axis of the CDF can be? why?
1 because y axis is the cumulative probability (and all events are mutually exclusive)
36
What are the axes of the PDF graph when showing the exam scores of a group of students? What is the usual shape of a PDF graph? What is the area under the graph equal to?
x=exam score y=probability density - bell shaped - equal to 1
37
What is the difference between probability and likelihood? In a cognitive model of response times would the probability/likelihood look at parameters/data points (eg latency)? What question does probability and likelihood investigate?
likelihood: possibility of model given data probability: possibility of data given model likelihood: data points of latency How likely is this model given this data point? probability: parameters How probable is the data given these parameters in the model?
38
Which function does MLE want to maximise? why?
maximise likelihood function so that the observations are most likely -> to find highest peak
39
How are MLE and LSE similar?
they both use the Nelder-Mead optimisation algorithm but MLE has a reversed-sign, puts a minus in front of it so that MLE can find maximum (LSE finds minimum)
40
What are the two steps in MLE? What function and algorithm does MLE use?
1. log transform (ln) 2. reverse the sign maximum likelihood function using Nelder-Mead Simplex
41
What mathematical transformation does MLE use with the Nelder-Mead simplex? Why is this done? Graphically, how does this change shape when adding the transformation?
log transformation -to make values easier to interpret - skinny bell shape to a fatter bell shape
42
What model is used more often in cognitive psychology: LSE or MLE? why?
MLE because it is easier to assess model fit and you can compare models
43
Does your model become increasingly more good the smaller the discrepancy between observed and predicted? ie is it better if you want to use this model to predict further data points? What is this concept known as?
not always because if you have little or no discrepancy, the line follows each data point exactly, then when you try to predict a future data point then it wild be not a good estimation overfitting
44
What is Occum's razor? How does it relate to model fitting?
-a light switches on simply by a person flicking the switch and not loads of little people inside the switch turning it on -> sometimes the simpler answer is better -model fit is sometimes better if it is more simple: by a straight light and not a polynomial line doing a dot-to-dot of each data point (overfitting is worse)
45
What are the two characteristics of a good model fit?
-being flexible: be able to fit different patterns of data -not overfitted
46
What is model flexibility determined by? (3 points) Why is flexibility important?
-number of free parameters -> more free parameters is better result -functional form of model -> models can produce a wider variety of patterns based on their parameter settings -parameter bounds ->being aware of parameter space as putting bounds on some parameters can decrease model flexibility -flexible model = good model fit
47
When fitting a model, what two things must you balance? to get the best fit?
model complexity and model flexibility too simple -> systematic misfit (bias) (increasing complexity ->) increasing flexibility -> increases variance of fit: too flexible -> overfit!
48
As you increase model flexibility, does the goodness of model fit increase?
-no, if model is too flexible it can be overfit
49
What is the problem with highly flexible models?
they fit well to data based on parameters of the model, but they generalise poorly to new data. (predict badly future data points)
50
What is the main advantage of using MLE? How is this advantage achieved?
multiple model comparisons to find the best and simplest model (based on good flexibility and no overfit)
51
What is a nested model? how do you derive a nested model?
models which have a simpler/special case version of a complex model complex model + nested model usually by constraining one or more parameters in the complex model -> simpler model
52
What are non-nested models?
models where you can't derive simpler model from the other complex model
53
What is Akaike Information Criteria (AIC)? What is the criteria for the AIC?
(MLE) test which compares multiple NON-NESTED models - (different models with) same complexity and same no. of parameters
54
For MLE, to compare multiple models which test must you use for nested and non-nested models?
nested: likelihood-ratio test (LRT) non-nested: Akaike Information Criteria (AIC)
55
Why is it relevant to talk about model nesting with the likelihood-ratio test (LRT)?
likelihood-ratio test makes comparisons between multiple models to see which is best. nested or non-nested show if the models are eligible for comparison.
56
For comparing two of the same model (nested) with different parameters using LRT with deviance, what is the degrees of freedom characterised by for the chi squared test?
DOF = difference in no. of parameters between the two models
57
For comparing two of the same model (nested) with different parameters using LRT with deviance, how do you know which model to choose as best?
select the more complex model if the chi squared value is greater than the critical value from the lookup table =better model
58
What is LRT with deviance? What is it used for?
likelihood-ratio test compares deviance to compare two nested model with different number of parameters and to see if the more complex/simple is better (complex=more parameters)
59
What is the equation for LRT (with deviance)? What does each variable mean?What statistical features does it employ?
chi squared = -2lnLspecific - ( -2lnLgeneral) where L = maximum likelihood Lspecific = simpler model with fewer parameters (H0) Lgeneral = comple complex model with more parameters (H1)
60
Why does the LRT equation have -2lnL?
because you have to log transform and reverse the sign to do a MLE
61
What is the issue with using LRT (with deviance) for multiple model comparisons to find the best model?
you can only compare the models having different numbers of parameters in only TWO NESTED models
62
What is the equation for AIC? What does each variable mean?
AIC = -2lnL + 2K AIC = likelihood + complexity L=maximum likelihood K=no. of parameters
63
Do you have to calculate an AIC for each competing model? How do you choose the best model using AIC calculations?
-yes -choose the model with the lowest AIC = best model
64
What funciton does the MLE minimise?
MLE minimises a reversed, log-likelihood function (-2lnL)