Neural Networks Intro Flashcards

1
Q

What does creating a machine learning algorithm mean?

A

It means building a model that outputs correct information from provided input data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the training process?

A

The process which the model learns to make sense of the input data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 4 ingredients of training an algorithm?

A
  1. Data
  2. Model
  3. Objective Function
  4. Optimization Algorithm
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which of the following is NOT a building block of a machine learning algorithm?

Data

Variable

Objective function

Optimization algorithm

A

Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Training the model is:

A pure trial and error process

A kind of trial-and-error process with some feedback

The process of giving guidelines to the computer to find patterns

The process of watching people or a machine perform an activity and replicating it

A

A kind of trial-and-error process with some feedback

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Self-driving cars learn by:

Driving many hours before learning how to do it safely and efficiently

A very strict set of rules that Elon Musk and the others are programming day and night

“Watching” thousands of hours of footage of real people driving

Breaking the rules (e.g., go on red light, go over the speed limit) and waiting to get punished for it

A

“Watching” thousands of hours of footage of real people driving

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 3 types of machine learning?

A
  1. Supervised
  2. Unsupervised
  3. Reinforcement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the two subtypes of supervised learning?

A
  1. Regression
  2. Classification
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The linear model is given by:

y = xT w + b

y = wx + b

y = wT x + b

All of the above

A

All of the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The linear model for multiple inputs is given by:

y = xw + b

y = x1w1 + x2w2 + b

y = x1w2 + x2w1 + b

y = x2w2 + x2w2 + b

A

y = xw + b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

You have y = xw + b, where w = [1.2, -3], while b = [7]. If x = [2 , 3], what is the value of y?

0.4

-13.6

4.6

9.4

A

0.4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Objective Function?

A

the measure used to evaluate how well the model’s outputs match the desired correct values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the two Objective Function types?

A

Loss - minimizing the error

Reward - reinforcement earning - super mario hight score example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In supervised learning, we are dealing with:

lost functions

loss functions

reward functions

reinforcement functions

A

loss functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Reward functions are NOT:

functions we are trying to maximize

functions used in reinforcement learning

functions we are trying to minimize

functions

A

functions we are trying to minimize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the common Loss Function dealing with Regresssion?

A

L2-norm

17
Q

What is the common Loss Function when dealing with Classificaiton

A

Cross-entropy

18
Q

What is a Target?

A

The desired outcome. We want Y to be as close to T as possible

19
Q

What are the outputs of a regression?

A

continuous numbers

20
Q

A target is:

The correct value at which we are aiming

A synonym for output

A part of the model

Always bigger than 0

A

The correct value at which we are aiming

21
Q

The objective function measures:

how well the targets match our model’s outputs

how well our model’s outputs match the targets

the model’s parameters

linearity of the data

A

how well our model’s outputs match the targets

22
Q

The L2-norm loss is used for:

k-means clustering

classification

regression

hierarchical clustering

A

regression

23
Q

Cross-entropy loss is used for:

k-means clustering

classification

regression

hierarchical clustering

A

classification

24
Q

Which cross-entropy points to the best match between outputs and targets?

L = 12.41

L = 0.78

L = 0.44

L = 0.77

A

L = 0.44

25
Q

The cross-entropy loss divided by 10 cannot be used for machine learning

True

False

A

False

26
Q

The cross-entropy loss MULTIPLIED by 10 cannot be used for machine learning

True

False

A

False

27
Q

The gradient is:

a generalization of the derivative concept

a generalization of the integral concept

a generalization of the optimization algorithm

a generalization of the objective function

A

a generalization of the derivative concept

28
Q

The gradient descent is a type of:

data

model

objective function

optimization algorithm

A

optimization algorithm

29
Q

The learning rate is denoted by which Greek letter?

alpha

sigma

nabla

eta

A

eta

30
Q

A high learning rate:

is better, as you learn faster

is faster, but may not reach the minimum

is slower, but sure to reach the minimum

is faster, and always reaches the minimum

A

is faster, but may not reach the minimum

31
Q

How is the Loss Function denoted?

A

L(y,t) - loss
C(y,t) - cost
E(y,t) - error

32
Q

N-parameter gradient descent differs from the 1-parameter gradient descent as it deals with:

many weights and biases

many input variables

many output variables

many targets

A

many weights and biases

33
Q

We use the delta to denote:

difference in models

difference between outputs and inputs

difference between outputs and targets

difference between methodologies

A

difference between outputs and targets

34
Q

The weights and biases:

Are the same thing

Have different update rules

Are rarely updated

Have update rules following the same logic

A

Have update rules following the same logic

35
Q

What determines a model uniquely?

A

The weights (w), biases (b), and model architecture

36
Q

Training in a deep network is done by adjusting the parameters in the direction of minimizing the loss. How do you find that direction?

A

By calculating the gradient of the loss with respect to the weights

37
Q

Why is the loss sometimes divided by the number of observations/data points?

A

to keep the same learning rate for different sized datasets