Week 10 - NeRFs, Gaussian Splatting and Stable Diffusion Flashcards

(13 cards)

1
Q

How do you train a diffusion model?

A

Training a diffusion model is done by a forward process that destroys the image data over a number of timesteps
Afterwards, a neural network learns to perform a reverse process that allows us to re-create the image by removing noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can you generate new samples when training a diffusion model?

A

At inference time, new random noise can be used to generate new samples from the learned distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does the Forward Process in Diffusion models work?

A
  • Noise is added in a Markov Chain
  • Each step’s noise is only dependent on the previous step
  • This is a fixed process, containing no learned parameters.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the Schedule in terms of Diffusion Models?

A

The schedule is part of the model that determines the rate at which noise is added to the image.
Smaller steps are taken when there is little noise, and larger steps are taken when the image is mostly noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you perform the Reverse step in a Diffusion Model?

A

You use a U-net style model that learns the reverse diffusion process. This is done by predicting the noise in the image given the current timestep.
By removing the predicted noise from the image, we get an approximate of the original image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the main components of the Training Loop in Diffusion Models?

A

Loop until converged:
- Get next sample image from the dataset
- Get a random value (t) between 0 and t
- Using forward process, get the noise used to get from the first image to the second
- Predict the added noise using the denoising UNet
- Apply an MSE loss between the predicted and ground truth noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the main components of the Inference Loop in Diffusion Models?

A
  • Sample Xt i.e. random Gaussian noise
  • For all timesteps, t, in range T -> 0
    – Predict all the noise in the images
    – Remove all the noise to get an approximate value of x0 i.e. the first image
    – Add noise back to the image to get X(t-1)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some comparison points between Diffusion Models and GANs/VAEs?

A
  • Diffusion models can now generate higher quality images than either GANs or VAE models
  • Much easier to train and more consistent compared to GANs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some ethical concerns in using Generative AI?

A
  • Data usage
  • Misinformation
  • Privacy & Consent
  • Impact on the creative industry
  • Impact on wider society
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do text prompts work in the context of Diffusion Models?

A
  • Text prompts rely on learning a mapping between text and image embeddings
  • A point in that embedding space can be used to condition a diffusion model to produce semantically relevant images
  • A model called a prior learns to convert text embeddings to image embedding, which can then condition the diffusion model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does Spatial Conditioning do to prompting?

A

It injects additional conditioning into the denoising UNet, which is then used to alter the prompt and then generate an image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is one method that can make Diffusion Models more efficient?

A

You can perform the diffusion process in a latent space, similar to a Variational AutoEncoder (VAE), where the diffusion process takes place in the lower dimensional space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are some challenges with Video Diffusion models?

A
  • Temporal Consistency:
    – Frame to Frame consistency
    – Object permanence
  • Realism:
    – Interactions between objects
    – Physics and fluid dynamics
  • Technical Challenges:
    – Dataset availability
    – Compute cost
How well did you know this?
1
Not at all
2
3
4
5
Perfectly