Neurocognitive Modelling Flashcards

(196 cards)

1
Q

What are neural models?

A
  1. Consider neural properties such as receptive fields and tuning curves
  2. Works for relatively small networks and simple tasks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are cognitive models?

A
  1. Consider latent parameters that allow us to make inferences about cognitive processes
  2. Very general, but need some constraints on structure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are normative models?

A
  1. Derive optimal solution for a task within constraints
  2. Independent of actual behaviour, but can be compared
  3. Theory driven
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are behavioural models?

A
  1. Fit directly to actual behavioural data (process model)
  2. Very data intensive
  3. Needs data & theory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the general equation for a cell? (eg, a simple cell in the primary visual cortex)

A

response = function (stimulus)
r = f(s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What questions can we ask about the neurons response function?

A
  1. What is f? (descriptive approach)
  2. How does f arise? (development / learning)
  3. What should f be? (Normative approach - efficient coding)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the job of a neuron?

A

To transmit information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Who is the father of Information theory?

A

Claude Shannon for his 1948 paper “A mathematical theory of communication”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Information theory?

A

A field that studies how to measure, quantify and transmit information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the key principle in Information theory?

A

The less we can predict something, the more information it gives us

Information as suprise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does suprise link to information?

A

Low suprise, low information
High suprise, high information
No suprise, no information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do we calculate information / suprise?

A

Negative logarithm of the probability of an event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What log base do we normally use for calculating information / suprise?

A

Log base 2 - this means it is in bits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

If P(heads) is the probability of heads, what is the way to write the information / suprise of heads?

A

I(heads)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is suprise?

A

The measure of information for a specific outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is entropy?

A

How much information does any one outcome give us on average

Weighted average of information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do we calculate the entropy of a system?

A

Entropy = p(x1) I(x1) + P(x2) I(x2) + ….

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the general equation for Entropy?

A

Entropy = -Σ p (xi) log(p(xi))

n over i on the sigma

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does more randomness lead to?

A

More suprise and therefore more information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How can we represent randomness as a probability?

A

A perfectly uniform distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does the equation for entropy simplify to for a uniform distribution?

A

log(n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Where has information theory been used outside of neuroscience?

A
  1. Communications (original form)
  2. Language
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How was information theory used in language?

A

English alphabet has 26 letters, log2(26) is 4.7 but in reality each letter transmits ~2.3 bits of information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Zipf’s law?

A

Frequency of a word is inversely proportional to its rank in the frequency table

All natural languages are inefficient

Looks the same for all natural languages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What does information theory allow us to do?
Quantify the amount of information in a channel (eg, neurons)
26
What is another word for information?
Suprise
27
What is another word for average information?
Entropy
28
If we are transmitting a signal X to Y, and noise = 0, what is the representation of mutual information for X and Y?
X = Y so I(X,Y) = H(X) = H(Y)
29
If we are transmitting a signal X to Y, and noise > 0, what is the representation of mutual information for X and Y?
I(X,Y) = H(X) - H(Y | X)
30
What does H(Y) mean?
Entropy of Y - amount of information received, some due to X, some due to noise
31
What does H(Y | X) mean?
Entropy of Y conditional on X Noise entropy
32
What is noise entropy?
Information received when X is constant ie. no information from X, therefore all information due to noise
33
How can mutual information be caculated?
It is symmetric so I (X;Y) = H(Y) - H(Y|X) and I(X;Y) = H(X) - H(X|Y)
34
What are 4 different ways of looking at the neural code?
1. Number of spikes in a given window 2. Precise spike timing 3. Inter spike intervals 4. Synchronous spikes with another neuron
35
How can we turn neural signals into a probability distribution?
Bin neural signals across time
36
How much information was a string of ten binary neural values (neural code) found to exhibit?
5 bits Maximum achieveable would be 10
37
Why is a neuron not giving maximal information?
Neurons are rate limited due to the high energy cost of spiking A neuron's transmission of information rate will therefore be far from the maximal rate that could theoretically be achieved
38
How is energy used in the brain?
45% to neocortex 13% for spiking Allowing 0.16 spikes/s for each neocortical neuron
39
Around what proportion of a neuron's entropy is noise? (fly visual neuron study)
50%
40
How can information theory help us to understand the neural code?
If a certain code results in much more information than another, we can use this as suggestive evidence for brain function
41
What did later studies (Butts et al 2007) show with regards to neural code?
The timescale of the input matters
42
What can we conclude from Butts et al?
1. Stimuli change on a certain timescale 2. Spikes need to be more precise on a shorter timescale than the stimulus to transmit information about the stimulus 3. We can use this to calculate how precise that needs to be (at a certain point, increasing precision will not increase information)
43
What is the issue with calculating mutual information from neuroscientific data?
1. We need to know probabilities of certain events 2. We don't actually know the true probabilities, we estimate from relative frequencies 3. Typically in neuroscience, small number of samples 4. Less samples we have, more skewed distribution, lower estimate of entropy
44
What is the overall bias effect from calculating mutual information off a small sample of data?
I(R;S) = H(R) - H(R|S) H(R) is biased downwards H(R|S) is biased downards EVEN MORE because it is calculated on a smaller sample So Mutual Information is biased upwards!
45
What is the easiest solution to check if a measure of mutual information is biased?
1. Remove half your data and recalculate 2. See if it stays similar 3. If it does, probably fine 4. If it doesn't, more advanced solution needed
46
What are some advanced methods for rectifying MI bias?
1. Extrapolation 2. Lower bounds 3. Analytical methods 4. Bayesian methods
47
What is the efficient coding hypothesis?
A group of neurons should encode as much information as possible or Remove as much redundancy as possible
48
Who came up with the efficient coding hypothesis?
Barlow, 1961
49
How can we express the efficient coding hypothesis using information theory?
Maximise I(s; R) which goes to Maximise I(s;f(s))
50
What is I?
Mutual information
51
What is S?
Signal
52
What is R?
Neural response - to be optimised
53
What is f?
Tuning curve - to be optimised
54
If a signal transmits 1 bit of information to neuron a and b, what is the range of information of the system?
Between 1 and 2 bits 1 worst case they encode the same thing 2 is best case, efficient, no overlap Calculate MI between a and b. Goal is for it to be 0.
55
What do we need to know to pick f to maximise I(s; f(s)) in the example of the visual neuron?
1. What is the distribution of natural stimuli 2. What functions can we implement with the limitations of neural architecture?
56
What are the subcategorisations of the image space?
1. Image space 2. Matching first order statistics (single pixels) 3. Matching second order statistics (pairs of pixels) 4. Higher level analysis (ICA) 5. Natural images
57
What it first order statistics?
Look at probability distribution of different brightlness values in a particular image, over our whole image space
58
What is the model of first order statistics?
p(x1,x2,x3....) = p(x1)p(x2)p(x3)... Pixels are independent
59
How can we use the first order model?
Work from our distribution to randomly draw images
60
What is second order statistics?
Look at relationships between pairs of pixels
61
What are pixel correlations in second order statistics?
Looking at dependencies (does knowing 1 pixel value tell us about another)
62
How can we use pixel correlations to generate model of natural images?
We can measure and quantify the relationship between pixels and use it in a probability distribution
63
What is power spectral density (Fourier domain) in second order statistics?
Treat images as functions of varying brightness values 1. Take a single line of our image and define a function that traces their brightness values 2. Apply a fourier decomposition to see how brightness values change 3. Rapid change -->high frequency Gradual change --> low frequency
64
What is the power of different frequencies in natural images?
Higher power in low frequencies --> gradual change Lower power in higher frequencies
65
What can we use our second order statistics (power spectral domain or pixel correlations in)?
A model such as the Gaussian image model
66
What does the Gaussian image model use?
A multivariate normal distribution
67
What is the definition of the Gaussian image model?
N (x | mu, sigma) x is brightness values of our pixels mu is vector containing mean pixel intensities sigma is covariance matrix describing correlations between pixel pairs
68
What is the purpose of using the Gaussian image model?
1. Pixels in natural images are correlated 2. The correlations can be captured in a simple Gaussian model 3. We can now maximise information, assuming that natural stimuli come from this simple distribution
69
What is our encoding model now we have modelled our natural stimuli?
r = Ws r is the neural response W is the neural filter (receptive field) s is the pixel values from our modelled natural image
70
Why is decorrelation as efficient coding an effective method?
We want to minimise redundancy, which means minimise correlation Maximise variance, minimise correlation
71
Why does more variance increase information?
Curve is spread - more uncertainty, more information
72
What is the goal with decorrelation as efficient coding?
Take correlated pixel inputs & transform these into decorrelated neural activity
73
What is whitening?
A transformation that removes correlations between signals and normalises their variances
74
What is the correlations and variancein a whitened image?
1. No correlations 2. Equal variance across all components
75
What is the first step of whitening?
Decorrelating pixels 1. Eigen decomposition of the covariance matrix (PCA) 2. Rotate data
76
What is the second step of whitening?
Scale axes to equalise range (full variance of pixel values, and all variances equal)
77
What is the end result of whitening?
r, the neural response becomes the identity matrix
78
What receptive field W results in the decorrelated neural response?
Checkerboard receptive field - not seen in nature
79
What is a localised whitening basis function?
Has an additional contstraint: localised in space
80
What do the localised whitening basis function?
Simulate the receptive fields of retinal neurons and cells in the lateral geniculate nucleus
81
What is the next step after using the Gaussian model?
Using independent component analysis
82
Why is ICA needed?
After whitening, the components are decorrelated, but they can still have higher order dependencies such as kurtosis. ICA removes these - by finding a rotation of the whitened data that makes the components as statistically independent as possible
83
How does ICA work?
Independent components cannot be Gaussian Linear components of non-Gaussian signals will become more Gaussian (central limit theorem) Therefore find directions in the data that are the least Gaussian
84
What does ICA add to the modelling equation?
Instead of modelling s in r = Ws with a Gaussian distribution, model it as s = Ma M = mixing matrix a = independent non Gaussian sources
85
What is the end result when ICA is applied to whitened patches of natural images?
The resulting filters - are localised in space - are orientated and bandpass (like edges) - look like Gabor filters, which resemble receptive fields of V1 simple cells in the visual cortex
86
What does ICA imply?
Supports the idea that the visual system may be optimising for statistical independence, not just decorelation Extracts independent features that are efficient for encoding natural scenes
87
How can we explain the response properties of retinal ganglion / thalamic visual neurons?
Decorrelation
88
Pair-wise correlations between pixels capture some of the statistics, but what do they not capture?
Statistics like edges that are non Gaussian
89
What does recovering non-Gaussian components in the data (ICA) lead to?
Receptive fields that closey resemble Gabor patches and therefore those of simple visual neurons in V1
90
Instead of modelling the stimulus distribution, what can we model instead?
How neurons should behave
91
What is sparse coding?
The idea that neural systems represent information using as few active neurons as possible at any given time
92
What are the advantages of sparse coding?
1. Maximises memory storage - fewer active neurons per pattern -> more patterns stored 2. Efficient energy usage - neurons that don't fire use less energy
93
What is sparsity?
Zero must of the time, but high values occasionally - distributions are super Gaussian
94
What are some examples of super Gaussian distributions?
1. Laplace distribution 2. Student t's distribution 3. Cauchy distribution
95
What is the issue with dense coding?
1. More energy 2. Less memory storage 3. Redundant 4. Inefficient
96
What are the issue with local codes?
1. Less robust 2. No representational flexibility 3. Limits generalisation 4. Harder to learn
97
What is the sparse coding model?
minimise E = - [preserve information] + λ[ sparseness] preserve... = mean squared error sparseness = choose a function that penalises non zero values
98
What is E in the sparse coding function?
Energy that we are trying to minimise, cost function
99
Summarise ICA
1. Images are mixtures of independent components with non-Gaussian statistics 2. The task of the brain is to demix the signal, ie. recover original components 3. Therefore, neural response properties will be non-Gaussian
100
Summarise Sparse coding
1. Super Gaussian (sparse) response statistics are desirble given constraints on the nervous system 2. Filters that maximise sparseness
101
What do ICA and sparse coding lead to?
Both give localised, Gabor like receptive fields that are like V1 simple cells
102
Criticisms of sparse coding 1 What is the issue with definitions being unclear
Are neurons' responses supposed to be sparse across populations or over time
103
Criticisms of sparse coding 2 We already know that neurons tend not to fire a lot - what is the issue with this?
We will find sparseness wherever we look
104
Criticisms of sparse coding 3 What brain complexity gives issue to the model?
The brain is more complicated than a binary network, eg. excitatory and inhibitory neurons
105
Criticisms of sparse coding 4 Sparse coding optimises memory storage, what is a possible issue with this approach?
Is memory storage the limting factor? Maybe generalisation / energy use is more important
106
What is reverse efficient coding?
Calculate optimal neural responses from stimulus statistics Revert this process and calculate presumed stimulus statistics from known neural responses, assuming that the stimuli are coded efficiently
107
What is a steepness curve?
Some stimuli are more behaviourally relevant than others - the steep function increases discriminability
108
What is Computational Neuroscience?
Understand biologically plausible representations and algortihms goverining neuronal activity patterns
109
What is Cognitive Science?
Decompose complex behavioural patterns and processes into computational components
110
What is Artifical Intelligence?
Combine component functions into computational models that can perform complex cognitive tasks
111
Explain the bottom up approach of computational neuroscience
Neuronal activity patterns - biologically plausible representations and algorithms governing those patterns - cognition
112
Explain the top down approach of cognitive science
Behavioural cognitive patterns - decompose complex cognitive processes into computational components - brain correlates
113
What is the aim with cognitive science?
Understand human cognitive capacities and behaviour
114
What is the method with cognitive science?
Develop theories and models that can explain and predict those capacities and behaviours
115
What are data models?
Statistical techniques and models to describe data and / or establish relations measured variables Have no intrinsic psychological content, just trying to match numbers
116
Give an example of a data model?
Heathcote et al 2000, is the practice effect described better by a power function or exponential function (Power function - learning continues by a constant fraction)
117
What do the symbols represent in a box and arrow model?
Box - processes Arrow - Flow of information / causal relationships
118
Give an example of a box and arrow model?
Baddely & Hitch 1974 - modelling working memory
119
What is the structure of baddely and hitch 1974?
Central Executive Visuospatial Sketchpad | Episodic Bufer | Phonological Loop Long Term Memory
120
What is computational modelling?
The process by which a verbal description is formalised to remove ambiguity, while also constraining the dimensions a theory can span
121
What does formalising a theory help us do?
1. Communicate theoretical ideas 2. Test theoretical hypotheses and predicitons 3. Compare different plausible models statistically
122
What is the advantage to simple models?
Simplified abstracted models can capture broad trends by ignoring processes we are not interested in currently
123
What should good models be?
Precise and falsifiable
124
What does it mean to be precise?
The model makes clear, specific, predictions - Must be mathematically or logically defined - It should say exactly what should happen under specific conditions
125
Why is important for a model to be precise?
So you can actually test it, simulate it and compare it to data
126
What does it mean to be falsifiable?
The model must be testable and potentially disprovable - There must be a way to prove the model wrong through experiment or observation - If a model can't be wrong ever, it's not scientific
127
What is model fitting?
Trying to minimise discrepancy between paramter values estimated by the model (predictions) and empirical data (observations) until they match as much as possible
128
What does the discrepancy function do?
Express the deviation between predictions and observations in a single value (cost function)
129
How can formal models help us?
Critically aid theory building
130
What are parameters?
Tuning knobs to adjust the values produced by the model, until they match (fit) observations
131
What are free parameters?
Flexibly adjusted until the difference between model estimated values and data is minimised
132
What are fixed parameters?
Set to specific, meaningful values that are invariant when fitting the model
133
What are the three popular model fitting frameworks?
1. Least squares estimation 2. Maximum likelihood estimation 3. Bayesian estimation
134
Summarise least squares estimation
Minimise the squared discrepancy between observations and predictions
135
Summarise maximum likelihood estimation
Find the parameter values that give the highest likelihood of the observed data
136
How does LSE work?
1. Fit a linear regression 2. Find the parameters that minimise the discrepancy function - via an optimisation algorithm
137
What is the Nelder Mead Simplex optimisation algorithm?
1. Compute discrepancy for starting values 2. Tumble down the error surface until it reaches its minimum
138
How does tumbling down the error surface work in Nelder Mead Simplex?
1. Reflection - remove point with largest discrepancy and flip to opposite side 2. Expansion - If reflection works, extend the flipped point out to take a larger step down 3. Contraction - If it didn't work - move the worst fitting more towards the centre 4. Shrinking - If contraction fails, shrink reduce simplex by half toward minimum
139
How can we trust that our obtained parameters reflect the reality of the data?
Bootstrapping
140
What is boostrapping?
Provides an indication of variability around the model parameter estmates, by repeatedly sampling
141
What is parametric resampling?
1. Fit the model to the experimental data set 2. SImulate multiple data samples by running the model with the originally estimated parameters 3. Fit model to the simulated samples
142
Drawbacks of LSE
1. No known statistical properties 2. Cannot statistically models 3. Parameter estimates have no inherent statistical properties
143
What is probability?
Chance of the data given the model
144
What is likelihood?
Chance of the model given the data
145
What do probability functions measure?
The probability of all possible events predicted by the model
146
What probability function do we use for discrete events?
Probability mass function
147
What probability function do we use for continuous data?
1. Cumulative distribution function (CDF) 2. Probability density function (PDF) - derivative of CDF
148
How does MLE work?
Maximise the likelihood function so that the observations are most likely - find highest peak
149
What are the steps in MLE?
1. Use a log transform on the likelihood function (make it a nice curve, not gaussian) 2. Express as deviance (flip it upside down 3. Use a minimising optimiser such as Nelder Mead Simplex on a reversed sign likelihood function
150
151
Why do we log transform in MLE? L -> log L
Easier interpretation and handling of psychological probability functions and combining multiple observations
152
Why do we express as a derivative in the MLE? ln L -> -2ln L
Easier assignment of model fit and model comparion (higher deviance - worse fit)
153
What properties should a good model have (with regards to fitting)?
Be flexible - fit different patterns of data Not overfit - not fit just any data
154
What determines model flexibility?
1. Number of free parameters 2. Functional form of the model 3. Extension of the parameter space
155
How does number of free parameters affect model flexibility?
More free parameters result in better fit
156
How does functional form of the model affect model flexibility?
Some models produce a wider variety of patterns based on their parameter values
157
How does extension of the parameter space affect model flexibility?
Bounds placed on the parameters can decrease model flexibility
158
How can we find the best and simplest model?
Nested models - Likelihood ratio test Non-nested models - Akaike information criterion
159
What is a nested model?
Simpler models (fewer parameters) vs same model with more complexity (more parameters)
160
What is a non nested model?
Different models with same complexity (same number of parameters)
161
How does likelihood ratio test work?
L specific / L general As we use the deviance function, division becomes a subtraction
162
How does Akaike Information Criteria work?
AIC = - 2ln L + 2K Likelihood of the model - complexity of the model (K: number of parameters)
163
What is Visual Working Memory?
Active maintenance of visual information to serve the needs of ongoing tasks
164
What is an example of testing VWM?
1. Change Detection Paradigm 2. Continuous Reproduction Paradigm
165
What is the Change Detection Paradigm?
Show an image briefly Gap Show another image to see if they are same or different
166
How does the Continuous Reproduction Paradigm work?
Coloured boxes randomly organised, say which colour one of them was from a wheel
167
What is the capacity of VWM?
3-5 chunks of information at a time
168
What do we assume when modelling VWM?
VWM capacity limt arises from a limited resource that is in some way distributed across items
169
Explain the Discrete Slots Model (Zhang and Luck 2008)
1. Resource is allocated to a limited number of discrete representation data 2. No information is stored about additional items once the capacity limit is reached
170
Explain the Continuous Resource Model (Bays et al 2009)
1. Equal allocation of a continuous resource across all items 2. Fewer resource per item for larger set sizes 3. Representations lose precision when capacity limit is reached
171
When are mixture models useful?
When data are assumed to result from a mixture of two or more distributions, each representing different processes of populations
172
What are the distributions in the Two Components Mixture Model (Standard Model)?
1. Noisy target representation - von Mises 2. Random guessing - Uniform
173
What did the two components mixture model show?
Items exceeding maximum VWM capacity are not maintained
174
What are the distributions in the Three Components Mixture Model (Swap Model)?
1. Noisy target representation - von Mises 2. Random guessing - Uniform 3. Noisy non-target representation - von Mises
175
What did the three components mixture model show?
The precision of the representation of items decreases with increasing set size
176
What is a von Mises distribution?
A circular normal distribution
177
What are Explanatory Models?
- Fit models simultaneously to multiple conditions to explain differences between these conditions with a common set of constrained parameters - Not particularly interested in parameter values
178
What are Measurement Models?
- Fit model separately to different experimental conditions to assess how the conditions affect latent constructs, with most or all parameters varying freely - Very interested in parameter values
179
How can we interpret the mixture models as measurement models?
Capacity = pt Precision = k Swapping = pn
180
What is capacity / pt interpreted as?
The quantity of information that can be held
181
What is the precision / k interpreted as?
How well is information remembered
182
What is swapping / pn interpreted as?
Chance of swapping
183
Why is the swapping term quite relevant?
There is an idea that what makes working memory so special is that we are able to remember things at specific locations (colour and location)
184
What are some conclusions from VWM
1. Training improves precision, not capacity 2. Improvements in precision are highly stimuli and paradigm specific 3. Changes in response patterns suggest possible interactions between paradigms
185
How has cognitive models helped our understanding of how VWM develops?
Helped us understand the lack of changes
186
What are 5 degrees of freedom researchers have?
1. Deciding when to stop collecting data 2. Excluding or including participants post hoc 3. Transforming variables or selecting among multiple outcomes 4. Trying different statistical methods until results are significant 5. Selectively reporting results
186
What are the four terms for increasing research quality and trust, analysis y axis data x axis?
1. Reproducible 2. Replicable 3. Robust 4. Generaliseable
187
What is reproducible?
Same data, same analysis yields same result
188
What is replicable?
Different data, same analysis yields same result
189
What is robust?
Same data, different analysis yields same result
190
What is generalisable?
Different data, different analysis gives similar result
191
What are 6 open research principles?
1. Preregistration 2. Open materials and methods 3. Open data 4. Open source software 5. Open source code 6. Open access publications
192
What are 5 good modelling practices?
1. Keep a model logbook 2. Parameter recovery studies 3. Sensitivity (robustness and generalisation) studies 4. Quanity uncertainty in parameter estimates 5. Share model code, data and scripts
193
What should we be aware of in good modelling?
1. Preregistration may not always be useful or possible 2. Modelling is an iterative process and requires exploration
194
What should we do when sharing code?
Make it understandable and reproducible Comments, notebooks, change log
195
What is preregistration?
When you publicly registering your research plan, including hypotheses, methods, and analysis plans, before data collection begins