Week 5 Flashcards

1
Q

What is Likelihood?

A

Probability of observing data given a particular model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the likelihood functions for discrete/continuous for Xis?

A

Where vector X is a sample from a distribution with parameter theta:

Discrete = PX(x;theta)

Continuous = fX(theta | x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Maximum Likelihood Estimate?

A

A MLE of theta is a value of theta that maximises the likelihood function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Maximum Likelihood Estimate?

A

A MLE of theta is a value of theta that maximises the likelihood function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are two possible options to find the maximum likelihood?

A
  1. ) Search - Exhaustive(low dimensional) or Grid.

2. ) Optimization Algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a Cost Function?

A

Maps a set of events into a number that represents the “cost” of the event occuring.

Also know as the loss or objective function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the cost function for likelihood, and why is it used?

A

J(theta, D) = -log(L(theta, D))

Convention: many optimization problems are minimization.
Convenience.
Numerically Stable: Product of theta will quickly converge to zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are optmization problems and their procedure?

A

Finding the best solution for the feasible ones.

  • Construct a model.
  • Determine the problem type.
  • Select algorithm.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between supervised and unsupervised machine learning?

A

Supervised: Given some training data, want to train a model to explain some data.

Unsupervised: Given some unlabelled samples, want to divide into multiple groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Gradient Descent?

A

A first-order iterative algorithm for finding a local minimum of a differentiable cost function.

Employ negative gradient at each step to decrease cost function.

Two ingredients - direction and magnitude (step size).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Classification?

A

Determining the most likely class that an input pattern belongs to.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Logistic Regression?

A

Regression model where dependent variable is categorical.

Goal is to predict the probability that a given example belongs to “1” class versus the probability it belongs to the “0” class.

Also known as logit regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does Logistic Regression work?

A

Use logarithm of the odds to model the binary prediction as a linear combination of independent variables.

Then use logistic function to convert log-odds to probability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly