Kullback-Leibler (KL) Divergence Flashcards

1
Q

Unnamed: 0

A

Unnamed: 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Kullback-Leibler (KL) Divergence, also known as relative entropy

A

Kullback-Leibler (KL) Divergence, also known as relative entropy, is a measure of how one probability distribution diverges from a second, expected probability distribution. It is commonly used in machine learning, particularly in tasks involving probabilistic models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. Definition
A

KL Divergence is a measure of the difference between two probability distributions. It quantifies the “distance” between the two distributions in terms of their statistical divergence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. Non-Symmetric
A

It is important to note that KL Divergence is not symmetric. That is, the KL divergence of distribution Q from distribution P is not the same as the KL divergence of distribution P from distribution Q.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. Non-Negativity
A

KL Divergence is always non-negative and equals zero if and only if the two distributions are identical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. Mathematical Formulation
A

If P and Q are discrete probability distributions, the KL Divergence of Q from P is calculated as the sum over all events x of P(x) times the logarithm base 2 of the ratio of P(x) over Q(x).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. Use Cases in Machine Learning
A

KL Divergence is used in many areas of machine learning, including in the training of probabilistic models (like variational autoencoders), model selection (by comparing the fit of different models to data), feature selection (by comparing the distribution of features to a desired outcome), and many other tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. In Information Theory
A

KL Divergence is also a key concept in information theory. In this context, it measures the number of extra bits needed to code samples from probability distribution P when using a code based on probability distribution Q.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. Limitations
A

While KL Divergence is a powerful tool, it does have limitations. For instance, it is not defined if there are points where P(x) is nonzero and Q(x) is zero, which can make practical computation tricky. It also assumes that P and Q are defined on the same probability space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly