week 5 - efficient coding III Flashcards
(33 cards)
What is the ECH? (two parts)
a group of neurons should encode as much information as possible OR remove as much redundancy as possible
What is the equation for ECH? Maximize…?
What does each component mean?
Maximize I(S;f(S))
I = mutual information
S = signal
f(S) = tuning curves to be optimized
What are the efficient coding (EC) parts of Whitening and ICA?
Whitening
EC = decorrelating pixels/data
ICA
EC = demixing to recover independent components
What model does ICA build apon?
ICA builds on whitening model
What are the steps in the process of whitening? (2 steps)
Whitening:
-Fit Gaussian distribution (correlates neighbouring pixels)
- Decorrelation pixels (EC part)
What are the steps in the process of ICA? (brief)
-fit more complex model (from non-Gaussian component)
-mix model (makes it Gauss)
-demix model to recover independent components (EC part) ->now non-Gauss
What is the problem with local codes? (assign one neuron to a concept)
What is the problem with dense codes? (assign one concept to many neurons)
What is the solution to these two problems ^?
-when this neuron dies, do you forget about this concept? no eg. grandma
-very robust however would cost a lot of energy
-use spare, distributed codes
What are the benefits of sparse, distributed codes?
maximalise memory storage but also save energy
What is kurtosis?
What does kurtosis describe?
a statistical term which describes the shape of the probability distribution curve
it describes the ‘taildness’, the prescence of outliers and shape of the peak
What do probability distributions with positive kurtosis look like?
sharp peak
heavier tails/more outliers
What is positive kurtosis aka?
super-Gaussian
leptokurtotic
What is the equation for the encoding model?
What does each component mean?
r = Ws
r=neural responses
W=weighted receptive fields
s=natural image pixels
What is the decoding model?
s=W(-1)r
(-1) is to the power of minus 1
What is the equation for the sparse coding model?
What does each term mean?
E = -[preserve information] - λ[spareness]
preserve information = the error term
In the sparse coding model equation, what does is the preserve information term mean mathematically?
preserve information = mean squared error
(this is the reconstruction error)
In the sparse coding model equation, what type of function represents the sparseness term?
sparseness = a function that penalizes NON-zero values
(make any negative values into positive values) f(x)= I x I
What do sparse filters look like?
What characteristics do they have?
like receptive fields in primary visual cortex V1
they are localised, orientation-specific and Gabor-like
What other type of filters do sparse filters look like?
like ICA filter
Which two filters look like V1 receptive fields?
sparse filters
ICA filters
What is the difference between the data/images used in ICA and in sparse coding?
ICA = mix of independent components with non-Gaussian stats
Sparse = has super-Gaussian response statistics
What does the sparse filter do?
maximalise sparseness
Will the neural response properties of ICA be Gaussian?
NO! non-Gaussian (recover independent non-Gaussian components)
as it is the task of the brain/neural response to demix the signal
Why is it desirable to have super-Gaussian statistics for the neural response in sparse coding?
because super-Gaussian stats are desirable because they maximalise information even when there are energy constraints in the nervous system
What is a criticism of the definition for sparse coding?
it is a bit vague: Are neurons’ responses SPARSE across population or time?