Lecture 4 - Information Theory Flashcards

1
Q

Marr’s Vision (‘83) - Stages

A

1- Image = what your retina sees
-Color/Light intensity

2- Primal sketch: highlight important parts of the scenes (tracing out the shadows, the light, the contours…)
-Edges
-Blobs
-Groups
-Boundaries

Boundary (zero-crossing) detection:
Image, blowup and receptor output
If we zoom in: Zero crossing is where the edge is… gives you the point in time
Can do a first and second derivative to find it (ask Aiden)

3- 2.5D Sketch: not full 3D representation yet
-Surface orientations (where the surface is pointing… surface-based representation… see image)

4- 3D model: how grouped together
-Hierarchical 3D Geons (see example)
From viewer-centric to object-centric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Marr’s Vision - Criticism

A

● ✅ Implementation level details
● ❌ Ambiguity → context
(ex: are those lines on the mug or just shadows?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Information Theory

A

What is information?
● Information is resolution of uncertainty (the more uncertainty to resolve (more surprising), the more information it contains)
● Signal resolves uncertainty → signal contains information
● Information can be expressed as bits. Bit= information container of binary digit…true or false, yes or no, value (like toss of coin)

Claude Shannon
Bool → Shannon
Quantify information → “Information Theory”

Information density
Which statement has more information (more density): “Cat bites man” or “Man bites cat”?
-“Man bites cat”! There is more uncertainty to resolve
Which file has more information: a file with 5,999 bytes (characters) long that just repeats hello, or a file with 5,999 bytes (characters) long that says random words?
-The one that says random words (more uncertainty to resolve = more information)… therefore will take up much more storage (in bytes)

● More surprise → more information
● Less frequent events → more information
● Observed information ∝ 1/(event frequency)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Self-Information / Information Content

A

See formula and explanation (know it’s a logarithmic formula)

Self-Information example
● Event likelihood: 100%
○ → p=1.0
○ → I(1.0) = -log2(1.0) = 0 bits of information
(less uncertainty, so less information)
● Event likelihood: 50% 🪙 (coin)
○ → p=0.5
○ → I(0.5) = -log2(0.5) = 1 bit of info
More concrete ex:
Is it head?
-If no, means it’s tails
-If yes, means it’s head
Here we asked 1 yes or no question, so there is 1 bit of information
● Event likelihood: “1 in 6” aka 16.67% 🎲
○ → p = ⅙
○ → I(⅙) = 2.58 bits of information
(more uncertainty, so more information)

An 8 sided di is 3 bits
Why?
1 in 8 = 12.5% = 0.125
-2log(0.125) = 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Quick Binary

A

How do we represent numbers in just 0s and 1s?
0000= 0
0001= 1
0010= 2
0011= 3 (2=0010 + 1=0001)
0100= 4
0101= 5 (4=0100 + 1=0001)
0110= 6 (4=0100 + 2=0010)
0111= 7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Useful machines

A

Machine A → ABABBAABAAAABABBABBABBAABBBBABABAAABAB
p(A) = 0.5
p(B) = 0.5
Machine B → AAAAAABAAAAABABAAAAAAAAAAAAAAABAAAAAAAB
p(A) = 0.75
p(B) = 0.25

Which one would have more information in theory? Machine A, because it has more uncertainty (since the probabilities are equal, it’s harder to know who will win)

How do we know for sure?
For machine A:
Calculate number of bits for each choice
50% chance of having A = -log2(0.5) = 1
50% chance of having B = -log2(0.5) =1
Then do weighted average : 0.51 + 0.51 = 1 bit
For machine bit: same thing
75% chance of having A = -log2(0.75)= 0.41
25% chance of having B = -log2(0.25) = 2 (makes sense, lower probability = more uncertainty so more information)
Then do weighted average: 0.750.41 + 0.252 = 0.81 bits
Answer: Machine A has more information (1 bit) and Machine B has less information (0.81 bits)

Is there a formula to calculate which machine has more information faster? Yes
See slide 37
Formula is called Shannon Entropy/Information Entropy
Entropy = H
Goal? Quantifies amount of information in source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Entropy

A

Entropy (H) is a measure of disorder
High entropy = chaotic and equally likely (like the 0,50 machine… we know less which one will win since it’s equally likely! More uncertain = more information! Therefore high entropy= more information

Applications:
-We can see how much information gets transmitted through sensory neurons! (neuron bandwidth)
Firing rates + how often they occur……….. calculate entropy…. find out muscle cells fire around 12 bits of information per second!
-We can calculate user experience (UX)
ex: which tv remove has higher entropy (disorder, information)
Use Hick-Hyman Law (Hick’s Law for short):
RT = a + b*H
a: task-specific offset
b: rate of information gain (task-specific)
H: entropy (transmitted information)
Therefore, using entropy in the formula, we can calculate that the reaction time for the more complex tv remote is higher than the less complex one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Summary

A

● Marrʼs Vision Hypothesis
○ Image → Primal Sketch (edges, blobs)
○ Primal Sketch → 2.5D Sketch (surfaces + their orientation)
○ 2.5D Sketch → 3D sketch (object-centric repr)
● Information theory → quantify information
○ Encoded as bits
○ Minimizing surprise = Information
○ Self-Information: I(A) = -log2( p(A) )
○ Entropy: H(X) = - ∑ (p * log2( p ))
○ Entropy applications: UX, neuron bandwidth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly