Lecture 17 - Blake Richards Flashcards

1
Q

The Neuron Doctrine

A

Parathesis:
Artificial NNs = ai
Natural NNs= in animal brains

The founding idea in modern neuroscience is the neuron doctrine. It
was formulated in its current form by the neuroanatomist Santiago
Ramon y Cajal (around the end of the 19th century), based on his
extensive observations of Golgi stained sections or brain tissue.
-only a subset of cells incorporate the dye
-neurons were really separate cells

he neuron doctrine says, roughly:
The functions of the brain are supported by distinct units,
literally distinct cells, called neurons.
Computation in the brain is a result of communication be-
tween neurons, which occurs via synaptic connections.
This was in contrast to “reticular theory”, which posited that the brain
was one gigantic, undifferentiated network without distinct units

Neurons receive inputs from other neurons’ axons (usually via
synapses on their dendrites or cell body), and integrate the inputs
according to basic principles of biophysics (more on this next week).
Some inputs excite the neurons (increasing activity) others inhibit
them (decrease activity).

Some animals have only a few hundred neurons, but some species
have evolved to have a crazy number of neurons (humans
≈ 8.0x1010), and an even crazier number of synapses (humans
≈ 1014 > number of stars in our galaxy).

More neurons = more capacity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Slow hardware

A

Neurons are the computational unit of the brain, and they are slow.
A single neuron can only transmit pulses of information (via an
action potential) at a maximum rate of a few hundred Hz.

Thus, despite being made up of some very slow pieces of hardware,
your brain manages to operate in real time in the real world.

Brains firing much slower than digital computers
But still more capable than all computer systems
How? Massive parallel processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Parallel computation

A

At any moment in time, your entire brain is active, with billions of
neurons working in parallel to compute.
That ten percent of your brain thing is fake news…

Each neuron in your brain connects to thousands of other neurons.
As such, they can integrate information from many sources at once,
and this can happen across billions of neurons simultaneously.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Distributed representations

A

Localist representation: one concept=one neuron… ex:grandmother cell….. is FALSE
Epilepsy patients and celebrity photos

Any stimulus or behaviour involves the activation of millions or
billions of neurons in your brain. Despite the way it was reported,
there are no individual cells for celebrities in your brain!

Thus, the representation of information in the brain is distributed
widely across a huge number of neurons.
This is why we see graceful degradation! If you get a bump on the
head, you still have millions of neurons capturing other aspects of
the information.

-sampling problem… only used like 100 cells

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Synaptic plasticity

A

Given the Neuron Doctrine, the following principal becomes clear:
If you change the synaptic connections between neurons, you
change the way in which information flows between neurons,
thereby changing the computations in the network.

In 1949, Donald Hebb (a neuroscientist here at McGill) proposed that
changes to synapses may occur as a result of activity in the pre and
postsynaptic neurons, and that this may be how we learn new things.
“When an axon of cell A is near enough to excite cell B and
repeatedly or persistently takes part in firing it, some growth
process or metabolic change takes place in one or both cells
such that A’s efficiency, as one of the cells firing B, is in-
creased.”

Bliss & Lømo demonstrated Hebb was right in 1973, and decades of
research since has supported this conclusion: we learn by modifying
the synaptic connections in our brains.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Summary so far

A

The Neuron Doctrine serves as the foundation for modern
neuroscience. It stipulates that neurons are individual cells, and they
form the basic computational units of the brain. They work together
thanks to synaptic connections, which allow them to communicate
with one another. These networks of neurons in our brain have the
following interesting properties:
* They integrate inputs from many other neurons
* They are relatively slow to signal
* They operate in parallel and integrate multiple inputs
* They represent information in a distributed manner
* They learn by changing their synaptic connections

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Biophysics in neurons

A

People often like to say that we have no idea how the brain works.
But, that’s a lie.
At the single neuron level, we have a really good understanding of
electrophysiology. It’s arguably one of the most successful models in
science.

All cells are composed of a cell membrane, made of a lipid bilayer,
that encapsulates long strings of amino acids (proteins).
The cell membrane is an electrical insulator, i.e. charged particles
cannot cross it.
However, some of the proteins in the bilayer (known as ion channels)
do allow specific ion species to cross
Because ions can’t cross the membrane, it acts as a capacitor,
storing positive and negative charges on either side.
In contrast, because ion channels allow ions to pass (though not
completely freely) they act as resistors.

Some of the unique biophysics of neurons:

Neurons have ion channels and pumps that ensure that their resting
voltage is roughly -65 mV.

When the V(t) in the axon hillock reaches threshold, additional Na+
and K+ channels open that generate an all-or-none action potential,
which propagates down the axon rapidly to all terminals.

When an action potential reaches a terminal, neurotransmitter is
released. Postsynaptic neurons have ion channels that open in
response.

The long tree like structures, dendrites, are where most synaptic
inputs arrive. They are effectively a tree of resistor-capacitor circuits,
which allows the synaptic currents from multiple sources to be
integrated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Multi-compartment models

A

By simulating every segment of every dendrite and axon in a neuron
using equations that describe ion flow, computational
neuroscientists are capable of reproducing numerous aspects of
neural activity in microcircuits.

So, great, we can simulate the brain, right? Markram thought so…
Two problems:
First, the equations are not analytically solvable, and you have to
numerically integrate over time to calculate the voltage in every
compartment of every neuron. This requires stupid amounts of
compute.

Second,these simulations are too complex to learn anything
interesting. All the synaptic connections must be set randomly and
updated according to overly simple rules that we know aren’t correct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Summary again

A
  • Cell membranes act like tiny resistor-capacitor circuits
  • Neurons are normally at a voltage of -65 mV at rest, but
    neurotransmitter inputs from other neurons can raise or lower
    this.
  • Inputs from other neurons are summed across the dendritic tree
    to alter the membrane voltage, per the equations for
    resistor-capacitor circuits.
  • If enough excitatory inputs are received to pass threshold, it
    leads to an action potential, which induces neurotransmitter
    release.
  • We can simulate this in exquisite detail, but the resulting
    equations are very complex and must be integrated numerically.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The early history of neural networks

A

Important to simplify:
To actually make progress, and use neural models to generate
algorithms, we need to simplify our models. People realized this
long ago. Most of the principles I outlined above for you were
worked out by the 1950s/60s!

In the 1940s, a pair of unlikely friends, Walter Pitts and Warren
McCullough, proposed that we could simplify the biological details
above in order to achieve basic Boolean logic:
The idea was to simplify the integration of synaptic inputs down to a
single linear sum, then simulate action potentials with a threshold
function. Different synapses could take either positive or negative
weights (corresponding to excitatory or inhibitory inputs). Put it all
together, and you could do Boolean operations.

Frank Rosenblatt extended the McCullough & Pitts, developing
perceptrons.
-The basic perceptron unit is like the McCullough & Pitts unit but with
scalar weights that can change with learning.
Simple perceptrons are made up of m units with output oi, that
results from integrating n inputs xj, in a manner similar in principle
to how real neurons integrate the current from multiple dendritic
and synaptic sources to generate an action potential if threshold, θi,
is passed:
*see formulas but don’t memorize

Given this, simple perceptrons are really just a linear transformation
followed by a non-linear function:
(matrix multiplications… dot products)

Perceptrons learn to associate specific inputs with specific outputs,
using a simple learning rule called the delta rule. Specifically, if we
receive a target output y = [y1, . . . , ym]T for input x = [x1, . . . , xn]T:
It can be proven that if a perceptron can implement a given function,
then training with the delta rule (using the correct outputs) will
converge to this function.
39

The problem with perceptrons is that they cannot implement
functions with non-linear separability! Minsky & Papert published a
book in 1969 on perceptrons where they proved this. This realization
led to a near total freeze in perceptron research for a decade.

(perceptrons later renamed Artificial Neural Networks ANN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Summary again

A
  • The early work on brain-inspired AI models adopted an
    approach of simplifying the basic equations of neural
    integration to a single linear step.
  • Action potentials were then simulated with a single non-linear
    threshold function.
  • The earliest model was the McCullough & Pitts (1943) which
    implemented Boolean logic.
  • This was extended by perceptrons, championed by Frank
    Rosenblatt, which could learn associations via the Delta rule.
  • Simple perceptrons composed of a single layer of units cannot
    learn to do non-linear separation problems, so they were largely
    abandoned in the 1970s.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Parallel Distributed Processing (PDP)

A

In 1986, they published their magnum opus on the topic, and called
the framework they had developed “Parallel Distributed Processing”
(PDP).

PDP models extended the basic ideas of people like McCullough &
Pitts and Rosenblatt, by proposing a general framework for setting
up more complicated models with a variety of learning algorithms.
The basic framework for a PDP model requires one to define:
* A set of processing units: {1, …, n}
* A state of activation at time t for each unit:
a(t) = [a1(t), . . . , an(t)]T
* An output function for each unit:
o(t) = [o1(t) = f(a1(t)), . . . , on(t) = f(an(t))]T
44
* A pattern of connectivity between all of the units:
W =






W11 W12 . . . W1n
W21 W22 . . . W2n
… … … …
Wn1 Wn2 . . . Wnn






* A propagation rule for sending activity between units:
net(t) = [net1(t) = G(W1, o(t)), . . . , netn(t) = G(Wn, o(t))]T
* An activation rule for combining the propagated activity with
current activity:
a(t) = [a1(t) = F(a1(t), net1(t)), . . . , an = F(an(t), netn(t))]T
45

  • A learning rule that stipulates how the connections change
    based on current weights, activity, and any external inputs, y:
    ∆W = Ψ(η, W, a(t), o(t), y)
  • An environment that provides inputs to some subsets of the
    units, depending on a stochastic function of both the network
    activity and latent variables that the model does not have
    access to.
    46

With these eight components we can define a vast number of
different models.
Nowadays, we refer to these models as artificial neural networks
(ANNs).
You can achieve many, many different functions within this
framework, including the functions that humans find easy but which
are hard to compute using traditional algorithms based on strings of
symbols and predicate logic.
How many?
47

Turing Completeness:
It can be proven that ANNs with multiple layers or recurrent
connections are Turing complete universal function approximators.
So, any computable function can be done with an ANN. See for
example:
* Hornik (1991) Neural Networks, 4:251
* Siegelmann & Sontag (1995) Journal of Computer and System
Sciences, 50:132
48

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Summary

A
  • The groundwork laid by earlier models in the 1940s-1960s was
    expanded on by the PDP framework that really came to the fore
    in the 1980s.
  • The PDP framework uses eight components to identify ANNs:
    1. Processing units
    2. State of activations
    3. Set of output functions
    4. Pattern of connectivity deinfed by weight matrices
    5. Set of propagation rules
    6. Set of activation rules
    7. Learning rules
    8. An environment
  • ANNs are Turing complete if they have multiple layers and/or
    recurrent connections.
    49
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Differences between natural and artificial neural networks

A

One of the most obvious ways that ANNs are different from real
brains is that ANNs don’t model the biophysics of neurons, as we’ve
discussed.
But, the fact that there’s some abstraction is not really an interesting
difference. The same can be said for all models in science

One way we can frame the abstraction question is that we can
understand computers (whether brains, laptops, or gear systems) at
different levels of analysis.
We can either take a high-level view of the computation performed
by a computer as an abstract operation, or we can zoom down to the
physical operations of the machine itself.

The computational neuroscientist David Marr articulated this idea
concretely in his book on vision.
He identified three levels of analysis for a computer:
1. Computational
-The computational neuroscientist David Marr articulated this idea
concretely in his book on vision.
He identified three levels of analysis for a computer:
1. Computational
-The computational level is concerned with analysing the
input-output functions of a computer. Put another way, it is
concerned with the observable behaviour of the computer.
Example: a cash register can take in a sale price and money received,
and output the change due.
2. Algorithmic
-The algorithmic level is concerned with analysing the steps actually
taken by the computer to perform the computation, i.e. the
algorithm it runs.
Example: a cash register uses binary addition and subtraction to
calculate the change due.
3. Implementation
-The implementation level is concerned with analysing the hardware
that is used to implement the algorithm.
Example: a cash register uses a series of digital transistors to
perform its binary subtractions.

Marr’s point was this:
The same computation can be performed by different algo-
rithms, and the same algorithm can be implemented with
different hardware. Therefore, we can model the brain at the
computational and algorithmic levels without fully capturing
the physical reality of the brain, and it is still a valid model.

The levels are not totally separable. For example, different pieces of
hardware run different algorithms more or less efficiently. But, it
means that even if we use models that don’t capture all of the
realities of the brain’s “wet ware” we can still capture its
fundamental algorithms and computations.

ANNs are intended to be an algorithmic level model of the brain.!!!
Thus, when we ask about the differences between ANNs and real
brains, we don’t care about the fact that a lot of the biological details
are abstracted out (that’s as intended).
The real question is: are there algorithmic-level differences that
could matter?

Differences:
1-Cell types
2-Dale’s law
3-Weight asymmetry
4- Architectures with many loops

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Summary

A
  • ANNs capture the core algorithmic motif of natural neural
    networks, e.g.:
  • Parallel processing
  • Distributed representations
  • The neuron doctrine
  • Learning via synaptic plasticity
  • Most ANNs do not capture some of the core algorithmic features
    of the brain, e.g.:
  • Specialized cell types
  • Dale’s law
  • Asymmetric weights
  • Loop-based architectures
    Note though: the PDP framework is very flexible, and some ANN
    models do capture these features!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Summary by the prof

A

● Blake: ANNs capture:
○ Parallel processing
○ Distributed representations
○ Neuron Doctrine
○ Synaptic plasticity learning
● Blake: ANNs do NOT capture:
○ Specialized cell types
○ Daleʼs law (neurotransmitters)
○ Asymmetric weights
○ Loop-based architectures