ML-Final Flashcards by tianye wang

What is threshold logical unit

Simple model of a neuron

Each input value is multiplied with the corresponding weight value, and these weighted values are then summed.

If the weighted summed input is larger than a certain threshold value, then the output is set to one, and zero otherwise

How well did you know this?

Not at all

Perfectly

What is weight parameter

representing the ‘strength’ of a connection

How well did you know this?

Not at all

Perfectly

What is a perceptron

where the output is calculated from the weighted summed input with a activation function(gain function, transfer function, output function, activation function. )

How well did you know this?

Not at all

Perfectly

Give examples of gain function, transfer function, output function, activation function.

??? sigmoid, tanh

How well did you know this?

Not at all

Perfectly

Why we add bias term to perceptron

A bias allows a perceptron to shift the prediction to better fit.

How well did you know this?

Not at all

Perfectly

Similarities of SVM and perceptron ?

Linear SVM is a special case of a perceptron

How well did you know this?

Not at all

Perfectly

What is the difference between Deep Learning and SVM?

SVM solve the optimization problem with specific transformations of the feature space.

Deep learning will aim at learning the appropriate transformations.

How well did you know this?

Not at all

Perfectly

What is delta term(delta rule?)

δ = (y(i) − y)y(1 − y)

Delta rule is a gradient descent learning rule for updating the weights of the inputs

How well did you know this?

Not at all

Perfectly

Why a multilayer feedforward network is a universal function approximator

there is guaranteed to be a neural network so that for every possible input, (x) the value f(x) is output from the network

given enough hidden nodes, any functions can be approximated with arbitrary precision by these networks

How well did you know this?

Not at all

Perfectly

What is error-back-propagation or backpropagation

calculation of the gradient proceeds backwards through the network, gradient of the final layer of weights being calculated first and the gradient of the first layer of weights being calculated last

How well did you know this?

Not at all

Perfectly

Batch, mini-batch and online

online avoiding local minima, mini-batch large datasets. Batch high memory space.

How well did you know this?

Not at all

Perfectly

No free lunch” theorem

no one model that works best for every problem.

The assumptions of a great model for one problem may not hold for another problem

How well did you know this?

Not at all

Perfectly

What is cross entropy? (The negative log probability) what is it used for ?

The negative log probability of the given label times the current model(probability distribution)

H(p,q) = − sum[ p(y) log q(y) ]

q: true nature of data
p: The neural network model represents the probability p(y|x; w)

Derive learning rule

How well did you know this?

Not at all

Perfectly

KL-divergence, what is it equivalent to? What are they related to? What are they used in neural network ?

related to cross entropy
H(p, q) = H(p) + KL(p||q)

minimizing the cross entropy is equivalent to minimizing the KL-divergence

both are closely related to the maximum (log) likelihood principle

use to generate learning rule?

How well did you know this?

Not at all

Perfectly

What is softmax function, why and where does it used in neural network?

Softmax function is a generalization of the logistic function that “squeeze” the output in the range (0, 1)

It is used to highlight the largest values and suppress values which are significantly below the maximum value in a neural network.

final layer of a neural network

How well did you know this?

Not at all

Perfectly

How neural networks are related to probabilistic regression?

cross entropy, KL-divergence

How well did you know this?

Not at all

Perfectly

What is the relationship of maximize the log probability and cross entropy

That is, we want to maximize the log probability of the data given the labels. Since the cross entropy is the negative of this, maximizing the log probability of the data given the labels is equivalent of minimizing the cross entropy

How well did you know this?

Not at all

Perfectly

What is deep learning?

Deep learning basically refer to neural networks with many layers

How well did you know this?

Not at all

Perfectly

What is a filter in CNN?

It is a vector describing a pattern

How well did you know this?

Not at all

Perfectly

What is convolution?

Convolution is the operation of multiplying and adding while shifting the filter

How well did you know this?

Not at all

Perfectly

What is a stride in CNN?

Study These Flashcards

It is how many steps you shift the filter in each iteration of the convolution

What is a pooling operation in convolutional neural networks and why is this operation important?

Study These Flashcards

Pooling is taking the average or the maximum of the previous output in a certain area of the filtered image.

It compress down the image and high-level representation.

This is usually called downsampling, this operation is important is because it reduce the dimensionality of features and computational cost .

Also helps to prevent overfitting.

Briefly explain `dropout’ and why it is used in deep networks.  

Study These Flashcards

Dropout: Randomly (e.g. p=0.5) ignoring hidden node for a specific input during learning. temporarily turned off

The reason we use it is that it is a regularization technique that helps to prevent overfitting.

Sparse representation, comprised representation and fully distributed representation.

Study These Flashcards

???

What is autoencoder? Why it is useful?

An autoencoder is a neural network that tries to reconstruct its input. it is a feature extraction algorithm it helps us find a representation for our data

What is the relation between Ridge regression and a Gaussian prior?

Ridge regression use the L2 regularization, and the L2 regularization is equivalent to a Gaussian prior.

What is batch normalization ?

Batch normalization: normalize the input to each hidden layer over each mini-batch

What is skip connections in neural network?

the process to skip the convolutional layers in the network

What is Recurrent Neural Networks and Where is the term ‘recurrent’ comes from?? What is used for ?

Recurrent Neural Networks perform the same task for every element of a sequence. It used for sequence processing eg, for machine translation

Explain what is backpropagation-though-time in RNN?

????

What is Gated Neural Network?

A gated recurrent network has an extra memory state(namely gated) that will be carried from the current step to the next step. A forgetting gate and a write gate can modify its value. An example of such neural network would be LSTM (Long Short Term Memory ) or a gated recurrent unit (GRU) s

What is Boltzmann machine ? What is the challenge of it?

Special form of recurrent network that the connections between nodes are symmetric The challenge is finding practical training rules

What is reinforcement learning (RL)?

A learning system with action and reward.

In reinforcement learning, what is a policy?  

A policy in reinforcement learning is use to determine the action to take in each state.

What are the RL challenges ?

1. Credit assignment | 2. Exploration versus exploitation trade-off.

What is Markov condition ? or the Markov Decision Process? (same as transition function in RL)

transition function only depend on the previous state and the intended action from the previous state

What is Reward function in RL?

rt+1 = ρ(st, at) | returns the value of reward when the agent is entering state st+1 by taking action at from state st

What is Policy in RL?

A policy in reinforcement learning is use to determine the action to take in each state. Policy: at = π(st)

Value function and Optimal Value function

Reward and disconnect reward this functions tells us how good is action a in state s ``` Value function (state-action): Qπ(s, a) Value function (state): V π (s) = Qπ (s, π(s)) Optimal Value function: V ∗(s) = max Q∗(s, a), ```

Optimal policy

Optimal policy: π∗(s) = arg max Q∗(s, a).

What is Model-based Reinforcement Learning ?

we assume that the agent has a model of the environment and its behaviour by knowing the reward function ρ(s, a) and the transfer functions τ (s, a).

What is Model-free Reinforcement Learning ?

???

SARSA

Q-learning

Explain the difference between the SARSA and Q-Learning algorithm.  

SARSA is an on-policy approach of RL. in the part where γ Q (st+1, at+1) we know that its use the previous policy to generate the next policy. Namely State-Action-Reward-State-Action. Q-learning is an off-policy approach in RL. γ m a xaQ (si+1, a′) is the part that is different than ASRSA. Here, we do not limit the how the next action is selected which means the policy generated in Q-learning is not depends on the previous policy.

epsilon-greedy policy ?

What is the difference between on-policy and off policy?

????

basic Bellman equation?

???

What can we learn about SARSA and Q-Learning ?

SARSA will avoid the mistake due to exploration, and Q- learning still have the ability to learn with different exploring policy.

What is reward function in RL? What is transfer function ?

reward function ρ(s, a) and the transfer functions τ (s, a).

What is non-Markovian condition ?

non-Markovian condition would be the case in which the next state depends on a series of previous states and actions

Temporal difference ?

ggggg

ML-Final Flashcards

(52 cards)