ML exam 1 Flashcards

Question 1

Q

What is supervised learning?

Answer

A

to learn a model from labeled training data that allows us to make predictions about unseen or future data

Question 2

Q

Rosenblatt perception

Answer

A

binary classification task
positive class (1) vs negative class (-1)
-takes input as a dot product of input and weights

Question 3

Q

step function

Answer

A

1 if z >= theta
-1 if otherwise

Question 4

Q

what does z equal

Answer

A

the linear combination

Question 5

Q

rosenblatt perception algorithm

Answer

A

initialize the weight to 0 or small number
for each training sample x(i),
a. comput y hat or output value
b. update weights

Question 6

Q

weight update rule

Answer

A

w(j) = w(j) + deltaw(j)

Question 7

Q

perception learning rule

Answer

A

deltaw(j) = n(y(i) - y hat(i))xj(i)

Question 8

Q

linear separability

Answer

A

draw a line through the negative and positive class

Question 9

Q

convergence

Answer

A

convergence if guaranteed if the two classes are linearly separable and learning rate is sufficiently small

Question 10

Q

if classes cannot be separated,

Answer

A

Set a maximum number of passes over the training dataset
(epochs)
Set a threshold for the number of tolerated misclassification
Otherwise, it will never stop updating weights (converge)

Question 11

Q

diagram of Rosenblatt perception

Question 12

Q

Adaline

Answer

A

Weights updated based on a linear activation function

Remember that perceptron used a unit step function

φ(z) is simply the identity function of the net input
φ

Question 13

Q

Adaline diagram

Question 14

Q

adaline vs rosenblatt

Answer

A

The weight update is done based on all samples in training set
Perceptron updates weights incrementally after each sample
This approach is known as “batch” gradient descent

Question 15

Q

cost function and equation

Answer

A

ML algorithms often define an objective function
This function is optimized during learning
It is often a cost function we want to minimize
Adaline uses a cost function J(·)
Learns weights as the sum of squared errors (SSE)

Question 16

Q

advantages of adaline cost function

Answer

Study These Flashcards

A

The linear activation function is differentiable
Unlike the unit step function
Why derivatives?
We need to know how much each variable affects the output!
It is convex
Can use gradient descent to learn the weight

Question 17

Q

gradient descent

Answer

Study These Flashcards

A

More precisely, the
gradient points in the direction of the greatest rate of increase
of the function, and its magnitude is the slope of the graph in
that direction.
- finds the local minimum of a given function

Question 18

Q

gradient computation

Answer

Study These Flashcards

A

To compute the gradient of the cost function, we need to compute
the partial derivative of the cost function with respect to each
weight wj

Question 19

Q

We update all weights simultaneously, so Adaline learning rule
becomes

Answer

Study These Flashcards

A

w := w + ∆w.

Question 20

Q

adaline vs rosenblatt

Answer

Study These Flashcards

A

Looks (almost) identical. What is the difference?
theta(z(i)) with z(i) being the wTx is a real number
And not an integer class label as in Perceptron
The weight update is done based on all samples in training set
Perceptron updates weights incrementally after each sample
This approach is known as “batch” gradient descent

Question 21

Q

if the learning rate is too high

Answer

Study These Flashcards

A

error becomes larger (overshoots global min)

Question 22

Q

if the learning rate is too low

Answer

Study These Flashcards

A

takes too many epochs to cover

Question 23

Q

stochastic gradient descent

Answer

Study These Flashcards

A

an optimization algorithm often used in machine learning applications to find the model parameters that correspond to the best fit between predicted and actual outputs

ML exam 1 Flashcards

(23 cards)