Week 10 - NeRFs, Gaussian Splatting and Stable Diffusion Flashcards

Question 1

Q

How do you train a diffusion model?

Answer

A

Training a diffusion model is done by a forward process that destroys the image data over a number of timesteps
Afterwards, a neural network learns to perform a reverse process that allows us to re-create the image by removing noise

Question 2

Q

How can you generate new samples when training a diffusion model?

Answer

A

At inference time, new random noise can be used to generate new samples from the learned distribution

Question 3

Q

How does the Forward Process in Diffusion models work?

Answer

A

Noise is added in a Markov Chain
Each step’s noise is only dependent on the previous step
This is a fixed process, containing no learned parameters.

Question 4

Q

What is the Schedule in terms of Diffusion Models?

Answer

A

The schedule is part of the model that determines the rate at which noise is added to the image.
Smaller steps are taken when there is little noise, and larger steps are taken when the image is mostly noise

Question 5

Q

How do you perform the Reverse step in a Diffusion Model?

Answer

A

You use a U-net style model that learns the reverse diffusion process. This is done by predicting the noise in the image given the current timestep.
By removing the predicted noise from the image, we get an approximate of the original image.

Question 6

Q

What are the main components of the Training Loop in Diffusion Models?

Answer

A

Loop until converged:
- Get next sample image from the dataset
- Get a random value (t) between 0 and t
- Using forward process, get the noise used to get from the first image to the second
- Predict the added noise using the denoising UNet
- Apply an MSE loss between the predicted and ground truth noise

Question 7

Q

What are the main components of the Inference Loop in Diffusion Models?

Answer

A

Sample Xt i.e. random Gaussian noise
For all timesteps, t, in range T -> 0
– Predict all the noise in the images
– Remove all the noise to get an approximate value of x0 i.e. the first image
– Add noise back to the image to get X(t-1)

Question 8

Q

What are some comparison points between Diffusion Models and GANs/VAEs?

Answer

A

Diffusion models can now generate higher quality images than either GANs or VAE models
Much easier to train and more consistent compared to GANs

Question 9

Q

What are some ethical concerns in using Generative AI?

Answer

A

Data usage
Misinformation
Privacy & Consent
Impact on the creative industry
Impact on wider society

Question 10

Q

How do text prompts work in the context of Diffusion Models?

Answer

A

Text prompts rely on learning a mapping between text and image embeddings
A point in that embedding space can be used to condition a diffusion model to produce semantically relevant images
A model called a prior learns to convert text embeddings to image embedding, which can then condition the diffusion model

Question 11

Q

What does Spatial Conditioning do to prompting?

Answer

A

It injects additional conditioning into the denoising UNet, which is then used to alter the prompt and then generate an image.

Question 12

Q

What is one method that can make Diffusion Models more efficient?

Answer

A

You can perform the diffusion process in a latent space, similar to a Variational AutoEncoder (VAE), where the diffusion process takes place in the lower dimensional space.

Question 13

Q

What are some challenges with Video Diffusion models?

Answer

A

Temporal Consistency:
– Frame to Frame consistency
– Object permanence
Realism:
– Interactions between objects
– Physics and fluid dynamics
Technical Challenges:
– Dataset availability
– Compute cost

Week 10 - NeRFs, Gaussian Splatting and Stable Diffusion Flashcards

(13 cards)