Image Generation Explanations Flashcards

Question 1

Q

What is the primary goal of a generator in a GAN?

Answer

A

To generate images that can deceive the discriminator
The generator tries to fool the discriminator into thinking fake images are real.

Question 2

Q

Which architecture is commonly used as the generator in Pix2Pix GAN?

Answer

A

U-Net
U-Net preserves spatial structure through skip connections.

Question 3

Q

In Conditional GAN (CGAN), how is the label information used?

Answer

A

Added to both the generator and discriminator
Labels condition both models to control the output.

Question 4

Q

What is the purpose of progressive growing in GANs?

Answer

A

To start with low-resolution images and gradually grow to high resolution
It stabilizes training and allows for high-res output.

Question 5

Q

In the diffusion model, what happens during the forward process?

Answer

A

Noise is added gradually to the image
Images are incrementally corrupted by noise.

Question 6

Q

What is the role of CLIP in DALL-E 2?

Answer

A

To encode images and texts into a shared embedding space
CLIP aligns image and text features in the same space.

Question 7

Q

Which component of Stable Diffusion is responsible for decoding latent embeddings back into images?

Answer

A

Variational Autoencoder (Decoder part)
The decoder reconstructs high-res images from latent space.

Question 8

Q

What is the purpose of the U-Net in diffusion models?

Answer

A

To predict noise at each timestep
U-Net estimates the noise to help reconstruct clean images.

Question 9

Q

Which model is commonly used in super-resolution GANs to upscale low-resolution images?

Answer

A

SRGAN
SRGAN learns to generate sharper, high-res images from blurry ones.

Question 10

Q

In DALL·E, the final image generation step is handled by:

Answer

A

A diffusion model
DALL·E uses a diffusion decoder to render images from embeddings.

Question 11

Q

Which problems are commonly encountered when training GANs?
A. Mode collapse
B. Overfitting discriminator
C. Perfect convergence
D. Non-convergence

Answer

A

Mode collapse; Overfitting discriminator; Non-convergence
Common GAN issues include poor diversity and unstable training.

Question 12

Q

In the diffusion process, the reverse process involves:
A. Removing noise gradually
B. Using a U-Net
C. Adding more noise each step
D. Predicting either clean images or noise

Answer

A

Removing noise gradually; Using a U-Net; Predicting either clean images or noise
The reverse process denoises images step by step.

Question 13

Q

Which techniques use cross-attention?
A. CLIP text-image matching
B. Stable Diffusion conditioning
C. GAN training without labels
D. DALL-E 2 conditioning

Answer

A

Stable Diffusion conditioning; DALL-E 2 conditioning
Cross-attention helps these models align generation with prompts.

Question 14

Q

Stable Diffusion components include:
A. Autoencoder (VAE)
B. U-Net
C. Transformer Decoder
D. Text Encoder

Answer

A

Autoencoder (VAE); U-Net; Text Encoder
These are the primary components used for latent-space generation.

Question 15

Q

In GANs, the discriminator is trained to:
A. Generate realistic images
B. Distinguish real from fake images
C. Provide gradients to the generator
D. Upsample noise

Answer

A

Distinguish real from fake images; Provide gradients to the generator
It acts as a teacher helping the generator improve.

Question 16

Q

Progressive growing of GANs helps by:
A. Speeding up training
B. Stabilizing the generator early
C. Reducing resolution at the end
D. Gradually increasing output resolution

Answer

Study These Flashcards

A

Stabilizing the generator early; Gradually increasing output resolution
It improves training stability and final image quality.

Question 17

Q

Which components are part of DALL-E 2’s generation pipeline?
A. Prior model
B. Diffusion model (Decoder)
C. LSTM text encoder
D. CLIP encoder

Answer

Study These Flashcards

A

Prior model; Diffusion model (Decoder); CLIP encoder
DALL-E 2 relies on these to link text and image spaces.

Question 18

Q

Benefits of latent space diffusion (as in Stable Diffusion) include:
A. Higher memory usage
B. Faster generation
C. Lower computation cost
D. High-resolution outputs

Answer

Study These Flashcards

A

Faster generation; Lower computation cost; High-resolution outputs
Latent space diffusion is more efficient than pixel-space.

Question 19

Q

In Super-Resolution GAN (SRGAN), during training:
A. Low-resolution images are upsampled
B. GAN loss is used
C. Noise is directly added to high-res images
D. Discriminator distinguishes between HR and SR images

Answer

Study These Flashcards

A

Low-resolution images are upsampled; GAN loss is used; Discriminator distinguishes between HR and SR images
GANs are trained on real vs. generated HR images.

Question 20

Q

In the diffusion model reverse process, each denoising step depends on:
A. Previous denoising result
B. Original input image
C. Text embeddings (if conditioned)
D. Random noise injection

Answer

Study These Flashcards

A

Previous denoising result; Text embeddings (if conditioned)
Each step refines the last output, guided by optional text.

Image Generation Explanations Flashcards

(20 cards)