Lecture 9 Flashcards

Convolutional Neural Network (CNN)

1
Q

Convolutional Neural Network (CNN)

A

A Convolutional Neural Network (CNN) is a type of deep learning algorithm that is particularly well-suited for image recognition and processing tasks. It is made up of multiple layers, including convolutional layers, pooling layers, and fully connected layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do we process and recognize images?

A

For visual perception, our neuronal cells are in charge of different
orientation. For example, some will respond to vertical edges, some
horizontal, some diagonal, etc. These neuronal cells are organized in columnar architecture and function together to full the visual
perception tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Key Insights from Mammalian Vision

A
  • An image is not processed, perceived or understood in one huge lump
  • The vision system considers small chunks of the visual field and
    extracts key features from each
  • Features are combined at later stages of processing into something
    recognizable as an object
  • This insight suggests that at the lowest level we can slide a small
    “receptive window” over input data – convolution – to process small
    chunks of input
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Happening in Convolutional Layer?

A

Filters are composed of two parts:
* A set of weights
* An activation function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

convolution

A

convolution is the summation of
the element-wise product of 2
matrices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sets of Layers in Typical Sequences

A

The convolution, non-linear, and pooling layers are typically used as a set. Multiple sets of the above three layers can appear in a CNN design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sets of Layers in Typical Sequences

A

Input -> Conv. -> Non-linear -> Pooling -> Conv. -> Non-linear -> Pooling -> …->
Output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sets of Layers in Typical Sequences

A

After a few sets, the output is typically sent to one or two fully
connected (dense) hidden layers.
* A fully connected layer is an ordinary neural network layer as in other neural networks.
* Typical activation function is the sigmoid function.
* Output is typically class (classification) or real number (regression).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Keras/TensorFlow in Python

A

Many different software platforms support neural network analysis, generally, and CNNs particularly. Python was used to build some of the earliest tools, but as an interpreted language
Python is far too slow to actually fit neural models at scale. Instead, we use a “front end” – “back
end” arrangement to take advantage of the efficiency of languages like C++ and CUDA (a GPU language). Here, we are using the Keras package as the “front end” for setting up our model and
data, and then Keras passes this to the TensorFlow backend to do the actual model fitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Two Keras Model Types

A

Sequential
(Functional) Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Sequential

A
  • Simplest approach and used in the majority of examples
  • Allows for one “input tensor” and
    one “output tensor”
  • Each successive layer of the model is “stacked” on the previous layer
  • The layers are connected in order
    of how they are invoked and the
    connections between layers are
    made automatically
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

(Functional) Model

A
  • More complex and flexible
    approach – addresses difficult
    “non-standard” computing
    problems
  • Allows for more than one “input
    tensor” and more than one
    “output tensor”
  • The output of a layer can be
    connected to more than one
    subsequent layer (think of this like
    parallel branches)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Tensor?

A
  • Tensor is a dimensional data structure
    A first-rank tensor can be a vector
    A second-rank tensor can be a matrix

Is a matrix = second-rank tensor?

“all squares are rectangles, but not all rectangles are squares”

Tensors obey specific transformation rules as part of the structure they have
but matices do not necessarily have this.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Many Types of Layers Supported

A
  • Each layer has a particular
    architectural configuration meant
    to accomplish a particular kind of
    task
  • For example, we know that pooling layers do data reduction while highlighting strong features
  • Each layer has options for size,
    initialization, and activation
    function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Many Types of Layers Supported

A
  • Partial list:
  • Preprocessing layers (e.g., text)
  • Core layers (basic types, e.g., “Dense”)
  • Convolution layers (1D, 2D, and 3D)
  • Pooling layers (1D, 2D, and 3D; max or
    average)
  • Recurrent layers (e.g., LSTM)
  • Normalization and regularization layers
  • Attention layers (multi-head)
  • Reshaping/merging
  • Activation layers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Activation Function Reminder

A
  • The ”secret sauce” of neural
    networks is non-linear activation
    functions
  • Linear functions model linear
    phenomena; anything more
    complex and we get predictions
    that only work in a narrow range
  • After the inputs to a neural node
    are summed, the activation
    function produces an output value
    (Y) based on the sum of the input
    values (X) according to curves like
    the ones at the right
17
Q

Loss Function, Optimizer, Metrics

A
  • A loss function (AKA “cost” or “error” function) is an expression
    that produces a value for “how wrong we are” with a set of
    predictions
  • There are two big groups of loss functions, one for classification
    tasks (probabilistic losses) and one for metric prediction tasks
    (regression losses)
  • The most well known (and widely used) regression loss is “mean
    squared error” – the mean of the squared differences between
    predicted and actual y values
18
Q

Loss Function, Optimizer, Metrics

A
  • Optimizers (in Keras) control
    the practicalities of how
    model fitting pursues the loss
    function
  • “stochastic gradient descent”
    – imagine a skier making small
    random turns to go downhill
    as quickly as possible
  • AdaDelta optimizer can adjust
    that learning rate dynamically
    to make model fitting more
    efficient
19
Q

Embedding Layer -
Tweet Matrix

A
  • Each tweet ti consists of a sequence of tokens w1,w2,…wni . L1 is the maximum tweet length. Short tweets are padded using zero padding.
  • Every word is represented as a d-dimensional word vector
  • The publicly available pre-trained GloVe word vectors for Twitter
    by (Pennington et al., 2014).
20
Q

Embedding Layer -
Hash-Emo Matrix

A
  • Hashtags, emoticons and emojis
  • for each tweet ti, we extract hashtags h1, h2, … and emoticons/emojis e1, e2, e… and concatenate the hashtags and emoticon/emoji vectors
  • L2: the height of the Hash-Emo Matrix. Tweets with the number of
    hash-emo features less than L2 are padded with zero while tweets
    with more hash-emo features than L2 are truncated.
  • d-dimension Word vectors from GloVe
  • Random initiation
  • no word vector is found for a particular word and emoticons.
  • for emojis, we first map it to something descriptive; and then generate random word vectors
21
Q

Convolutional Layer

A
  • Apply m filters of varying window sizes over the Tweet Matrix from
    the embedding layer
  • window size (k) refers to the number of adjacent word vectors in the Tweet Matrix that are filtered together (when k > 1)
22
Q

Dropout and Max Pooling Layer

A
  • ReLU is applied before dropout layer
  • Dropout is used as a regularization strategy to avoid overfitting
  • Max-pooling: the maximum value for each filter
23
Q

Dropout and Max Pooling Layer

A

1 3 2 1 3
2 9 1 1 5 9 9 5
1 3 2 3 2 -> 9 9 5
8 3 5 1 0 8 6 9
5 6 1 2 9

The filter moves through the layer in a 3x3 matrix extracting the highest value.

24
Q

Fully Connected Layer

A
  • Maps the inputs to a number of outputs corresponding to the
    number of classes we have.
  • Emotion recognition: a multi-class classification task
    • Softmax as the activation function and categorical cross-entropy as the loss function
    • The output of the softmax function is equivalent to a categorical probability
      distribution which generally indicates the probability that any of the classes are true