Lecture 4 - Information Theory Flashcards

Question 1

Q

Marr’s Vision (‘83) - Stages

Answer

A

1- Image = what your retina sees
-Color/Light intensity

2- Primal sketch: highlight important parts of the scenes (tracing out the shadows, the light, the contours…)
-Edges
-Blobs
-Groups
-Boundaries

Boundary (zero-crossing) detection:
Image, blowup and receptor output
If we zoom in: Zero crossing is where the edge is… gives you the point in time
Can do a first and second derivative to find it (ask Aiden)

3- 2.5D Sketch: not full 3D representation yet
-Surface orientations (where the surface is pointing… surface-based representation… see image)

4- 3D model: how grouped together
-Hierarchical 3D Geons (see example)
From viewer-centric to object-centric

Question 2

Q

Marr’s Vision - Criticism

Answer

A

● ✅ Implementation level details
● ❌ Ambiguity → context
(ex: are those lines on the mug or just shadows?)

Question 3

Q

Information Theory

Answer

A

What is information?
● Information is resolution of uncertainty (the more uncertainty to resolve (more surprising), the more information it contains)
● Signal resolves uncertainty → signal contains information
● Information can be expressed as bits. Bit= information container of binary digit…true or false, yes or no, value (like toss of coin)

Claude Shannon
Bool → Shannon
Quantify information → “Information Theory”

Information density
Which statement has more information (more density): “Cat bites man” or “Man bites cat”?
-“Man bites cat”! There is more uncertainty to resolve
Which file has more information: a file with 5,999 bytes (characters) long that just repeats hello, or a file with 5,999 bytes (characters) long that says random words?
-The one that says random words (more uncertainty to resolve = more information)… therefore will take up much more storage (in bytes)

● More surprise → more information
● Less frequent events → more information
● Observed information ∝ 1/(event frequency)

Question 4

Q

Self-Information / Information Content

Answer

A

See formula and explanation (know it’s a logarithmic formula)

Self-Information example
● Event likelihood: 100%
○ → p=1.0
○ → I(1.0) = -log2(1.0) = 0 bits of information
(less uncertainty, so less information)
● Event likelihood: 50% 🪙 (coin)
○ → p=0.5
○ → I(0.5) = -log2(0.5) = 1 bit of info
More concrete ex:
Is it head?
-If no, means it’s tails
-If yes, means it’s head
Here we asked 1 yes or no question, so there is 1 bit of information
● Event likelihood: “1 in 6” aka 16.67% 🎲
○ → p = ⅙
○ → I(⅙) = 2.58 bits of information
(more uncertainty, so more information)

An 8 sided di is 3 bits
Why?
1 in 8 = 12.5% = 0.125
-2log(0.125) = 3

Question 5

Q

Quick Binary

Answer

A

How do we represent numbers in just 0s and 1s?
0000= 0
0001= 1
0010= 2
0011= 3 (2=0010 + 1=0001)
0100= 4
0101= 5 (4=0100 + 1=0001)
0110= 6 (4=0100 + 2=0010)
0111= 7
…

Question 6

Q

Useful machines

Answer

A

Machine A → ABABBAABAAAABABBABBABBAABBBBABABAAABAB
p(A) = 0.5
p(B) = 0.5
Machine B → AAAAAABAAAAABABAAAAAAAAAAAAAAABAAAAAAAB
p(A) = 0.75
p(B) = 0.25

Which one would have more information in theory? Machine A, because it has more uncertainty (since the probabilities are equal, it’s harder to know who will win)

How do we know for sure?
For machine A:
Calculate number of bits for each choice
50% chance of having A = -log2(0.5) = 1
50% chance of having B = -log2(0.5) =1
Then do weighted average : 0.51 + 0.51 = 1 bit
For machine bit: same thing
75% chance of having A = -log2(0.75)= 0.41
25% chance of having B = -log2(0.25) = 2 (makes sense, lower probability = more uncertainty so more information)
Then do weighted average: 0.750.41 + 0.252 = 0.81 bits
Answer: Machine A has more information (1 bit) and Machine B has less information (0.81 bits)

Is there a formula to calculate which machine has more information faster? Yes
See slide 37
Formula is called Shannon Entropy/Information Entropy
Entropy = H
Goal? Quantifies amount of information in source

Question 7

Q

Entropy

Answer

A

Entropy (H) is a measure of disorder
High entropy = chaotic and equally likely (like the 0,50 machine… we know less which one will win since it’s equally likely! More uncertain = more information! Therefore high entropy= more information

Applications:
-We can see how much information gets transmitted through sensory neurons! (neuron bandwidth)
Firing rates + how often they occur……….. calculate entropy…. find out muscle cells fire around 12 bits of information per second!
-We can calculate user experience (UX)
ex: which tv remove has higher entropy (disorder, information)
Use Hick-Hyman Law (Hick’s Law for short):
RT = a + b*H
a: task-specific offset
b: rate of information gain (task-specific)
H: entropy (transmitted information)
Therefore, using entropy in the formula, we can calculate that the reaction time for the more complex tv remote is higher than the less complex one

Question 8

Q

Summary

Answer

A

● Marrʼs Vision Hypothesis
○ Image → Primal Sketch (edges, blobs)
○ Primal Sketch → 2.5D Sketch (surfaces + their orientation)
○ 2.5D Sketch → 3D sketch (object-centric repr)
● Information theory → quantify information
○ Encoded as bits
○ Minimizing surprise = Information
○ Self-Information: I(A) = -log2( p(A) )
○ Entropy: H(X) = - ∑ (p * log2( p ))
○ Entropy applications: UX, neuron bandwidth

Lecture 4 - Information Theory Flashcards

(8 cards)