College 1 Flashcards

1
Q

what are the types of artificial neurons?

A
  • perceptron

- sigmoid neuron

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Finish the sentence:

The perceptron takes several ..(1).. inputs and produces a single ..(2).. output.

A
  1. binary

2. binary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What determines whether the perceptron neuron’s output is 0 or 1?

A

The neuron’s output, 0 or 1, is determined by whether the weighted sum is less than or greater than some threshold value.
w * x + b > 0, output = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s the difference between a perceptron and a sigmoid neuron?

A
  • With perceptrons, a small change in the weights or bias of any single perceptron in the network can sometimes cause the output of that perceptron to completely flip, say from 0 to 1. Sigmoid neurons are modified so that small changes in their weights and bias cause only a small change in their output.
  • Just like a perceptron, the sigmoid neuron has inputs, x1, x2, … But these inputs can also take on any values between 0 and 1.
  • The output is not 0 or 1. Instead, it’s σ (w ⋅ x + b), where σ is called the sigmoid function or the logistic function. (If you want a binary output, you can for example decide to interpret <0.5 as 0.)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define: multilayer perceptron (MLP)

A

Network with an input layer, multiple hidden layers and an output later are sometimes called multilayer perceptrons (MLPs), despite being made up of sigmoid neurons, not perceptrons.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define: feedforward neural networks

A

Neural networks where the output from one layer is used as input for the next layer,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define: recurrent neural network

A
  • Artificial neural networks in which feedback loops are possible.
  • The idea in these models is to have neurons which fire for some limited duration of time, before becoming inactive. That firing can stimulate other neurons, which may fire a little while later, also for a limited duration.
  • That causes still more neurons to fire, and so over time we get a cascade of neurons firing.
  • Loops don’t cause problems in such a model, since a neuron’s output only affects its input at some later time, not instantaneously.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define: cost / loss / objective function

A

A cost / loss / objective function quantifies how well our algorithm finds weights and biases so that the output from the network approximates y(x) for all training inputs x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does gradient descent work?

A
  • You want to find a point where the cost function C achieves it’s global minimum.
  • We try this by randomly choosing a starting point and computing derivatives. In practice we compute the gradients seperately for every training example and average them.
  • We decide the direction of the step by choosing the direction which will lead to the largest immdediate decrease of C (defined as the vector of partial derivatives)
  • The size of the step is dependent on the learning rate.
  • We take the step and start computing derivatives again.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the difference between plain gradient descent and stochastic gradient descent?

A
  • Stochastic gradient descent can speed up learning
  • SGD picks out a randomly chosen mini-batch of training inputs.
  • The true gradient ∇C is estimates by computing the gradient for each input in the mini-batch and averaging over this small sample.
  • This is is repeated until all inputs are exhausted, which is said to complete an epoch in training. Then we start another epoch.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define: online / incremental learning

A

SGD with a minibatch of size 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the the back propagation algorithm do?

A

the backpropagation algorithm is a fast way of computing the gradient of the cost function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain the relation between:

  • deep learning
  • representation learning
  • machine learning
  • AI
A

Deep learning is a kind of representation learning, which is in turn a kind of machine learning, which is used for many but not all approaches to AI.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define: Knowledge base

approach to AI

A

Achieve AI by hard-coding knowledge about the world in formal languages. A computer can reason automatically about statements in these formal languages using logical inference rules.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define: Machine learning

A

The ability to acquire their own knowledge, by extracting patterns from raw data. Simple machine algorithms depend heavily on the representation of the data they are given. Each piece of information included in the representation of the patient is known as a feature. Many artificial intelligence tasks can be solved by designing the right set of features to extract for that task, then providing these features to a simple machine learning algorithm.
- E.g. logistic regression, naïve Bayes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define: representation learning

A

An approach to use machine learning to discover not only the mapping from representation to output but also the representation itself.
- E.g. shallow autoencoders

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Define: shallow auto encoders

A

An autoencoder is the combination of an encoder function, which converts the input data into a different representation, and a decoder function, which converts the new representation back into the original format.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Define: Deep learning

A

Deep learning represents the world as a nested hierarchy of concepts, with each concept defined in relation to simpler concepts, and more abstract representations computed in terms of less abstract ones. It is the study of models that involve a greater amount of composition of either learned functions or learned concepts than traditional machine learning does.
Deep learning resolves a problem by breaking the desired complicated mapping into a series of nested simple mappings, each described by a different layer of the model. The input is presented at the visible layer, so named because it contains the variables that we are able to observe. Then a series of hidden layers extracts increasingly abstract features from the image. These layers are called “hidden” because their values are not given in the data; instead the model must determine which concepts are useful for explaining the relationships in the observed data.
- E.g. MLPs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define: feedforward deep network, or multilayer perceptron (MLP).

A

A multilayer perceptron is just a mathematical function mapping some set of input values to output values. The function is formed by composing many simpler functions. We can think of each application of a different mathematical function as providing a new representation of the input.

20
Q

Define: Depth of a neural network

A
  1. The depth of the computational graph:
    The first view is based on the number of sequential instructions that must be executed to evaluate the architecture. We can think of this as the length of the longest path through a flow chart that describes how to compute each of the model’s outputs given its inputs.
  2. The depth of the probabilistic modeling graph
    The second view is based on the graph describing how concepts are related to each other
  • Both are dependent on the choice of sets of smallest element from which to construct the graphs.
21
Q

What are the three historical periods in deep learning?

A
  1. Cybernetics (1940s – 1960s):
  2. Connectionism / parallel distributed processing (1980s – 1990s):
  3. Deep Learning (2006 – now):
22
Q

What are the characteristics of the Cybernetics period?

A
  • Development of theories of biological learning
  • Simple linear models that take a set of input values, learn a set of weights and compute their outputs.
  • Models based on the f(x, w) used by the perceptron and ADALINE are called linear models. Linear models have many limitations. Most famously, they cannot learn the XOR function, where f([0,1], w) = 1 and f([1,0], w) = 1 but f([1,1], w) = 0 and f([0,0], w) = 0.
23
Q

What are the highlights of the Cybernetics period?

A

o McCulloch-Pitts neuron (1943)
- This linear model could recognize two different categories of inputs by testing whether f(x, w) is positive or negative. Weights were set by human operator.
o Perceptron (1958)
- The first model that could learn the weights that defined the categories given examples of inputs from each category.
o The adaptive linear element (ADALINE) (1960)
- Simply returned the value of f(x) itself to predict a real number and could also learn to predict these numbers from data
- The training algorithm used to adapt the weights of the ADALINE was a special case of an algorithm called stochastic gradient descent.

24
Q

What is the highlight of connectionism / parallel distributed processing?

A
  • Central idea = a large number of simple computational units can achieve intelligent behavior when networked together. This insight applies equally to neurons in biological nervous systems as it does to hidden units in computational models.
25
Q

what are the highlights of connectionism / parallel distributed processing?

A

o Distributed representation (1986)
- This is the idea that each input to a system should be represented by many features, and each feature should be involved in the representation of many possible inputs.
o Backpropagation (1986)
- To train a neural network with one or two hidden layers.
o Long short-term memory (LSTM) network (1997)
- Recurrent neural network to resolve mathematical difficulties in modeling long sequences. Now they are used to model relationships between sequences and other sequences rather than just fixed

26
Q

What are the highlights of Deep Learning?

A

o Deep belief network (2006)
- Neural networks can be trained efficiently using greedy layer pretraining.
o Neural Turing machines (2014)
- Neural networks that learn to read from memory cells and write arbitrary content to memory cells and can learn simple programs from examples of desired behavior.
o Reinforcement learning
- An autonomous agent must learn to perform a task by trial and error, without any guidance from the human operator.

27
Q

Define: unsupervised learning

A

Unsupervised learning is modeling the underlying or hidden structure or distribution in the data in order to learn more about the data. Unsupervised learning is where you only have input data and no corresponding output variables.

28
Q

Define: supervised learning

A

Supervised learning is simply a process of learning algorithm from the training dataset. Supervised learning is where you have input variables and an output variable and you use an algorithm to learn the mapping function from the input to the outpu

29
Q

Define: accuracy

A

Accuracy = Number of correct predictions / Total number of predictions

30
Q

Define: precision

A

TP / TP + FP

31
Q

Define: recall

A

TP / TP + FN

32
Q

Define: MAE

A

average of all absolute errors

33
Q

When was the perceptron invented and by whom?

A

1958, Frank Rosenblatt

34
Q

When was backpropagation invented and by whom?

A

1982, Paul Werbos
1986, David Rumelhart, Geoffrey Hinton and Ronald Williams
1989, Yann Lecun

35
Q

What were important advances in big data?

A

‘The cat experiment’ - 2012, Andrew Ng
Train a 9-layered NN with 1 billion connections on a large dataset of 10 million images. The NN is trained using model parallelism on a cluster with 1000 machines (16000 cores)

ImageNet - since 2009, Fei-Fei li
An image database organised according to the WordNet hierarchy, in which each node of the hierarchy is depicted by hundreds and thousands of images.
Training on image and video processing at large scale.

36
Q

What is AlexNet?

A

Deep Convolutional Neural Networks trained on ImageNet using GPU’s (2012)

37
Q

Explain the difference between a CPU and a GPU

A

CPU (multiple cores)

  • Core 1,2,3,4
  • Cache
  • System memory

GPU (Hundreds of Cores)

  • Cores
  • Device Memory
38
Q

Name some DL frameworks

A
  • Theano,
  • TensorFlow
  • Keras
  • Torch + caffe2 = PyTorch
39
Q

What are the shapes of:

  • a scalar
  • a vector
  • a matrix
  • a tensor
A
  • scalar: 1 x 1
  • vector: 1 x n
  • matrix: n x m
  • tensor: n x m x c x……
40
Q

Multiply matrix:
A 1,1 - A 1,2 - A 1,3
A 2,1 - A 2,2 - A 2,3

with vector:
X 1
X 2
X 3

A
y1 = A 1,1 X 1 + A 1,2 X 2 + A 1,3 X 3
y2 = A 2,1 X 1 + A 2,2 X 2 + A 2,3 X 3
41
Q
Transpose:
v1
v2
v3
v4
A

v1, v2, v3, v4

42
Q

Transpose:
A 1,1 - A 1,2 - A 1,3, - A 1,4
A 2,1 - A 2,2 - A 2,3 - A 2,4
A 3,1 - A 3,2 - A 3,3 - A 3,4

A

A 1,1 - A 2,1 - A 3,1
A 1,2 - A 2,2 - A 3,2
A 1,3 - A 2,3 - A 3,3
A 1,3 - A 2,3- A 3,3

43
Q

Add:
A 1,1 - A 1,2
A 2,1 - A 2,2

to:
B 1,1 - B 1,2
B 2,1 - B 2,2

A

A 1,1 + B 1,1 - A 1,2 + B 1,2

A 2,1 + B 2,1 - A 2,2 + B 2,2

44
Q

if you multiply a matrix of shape (3, 2) by a matrix of shape (4, 3) what is the resulting shape?

A

(4,2)

45
Q

What’s the requirement for matrix multiplication?

A

The number of columns in matrix A must be equal to the number of rows in matrix B.

46
Q

What are the properties of matrix multiplication?

A

●A(B+C) = (AB) + (BC)
●A(BC) = (AB)C
● AB is not equal to BA
● (AB) ^T = B^T A^T

47
Q

Element-wise product:
What is the Hadamard product for two matrices:
C = A dot B

A

Cij = AijBij