Image Generation Explanations Flashcards
(20 cards)
What is the primary goal of a generator in a GAN?
To generate images that can deceive the discriminator
The generator tries to fool the discriminator into thinking fake images are real.
Which architecture is commonly used as the generator in Pix2Pix GAN?
U-Net
U-Net preserves spatial structure through skip connections.
In Conditional GAN (CGAN), how is the label information used?
Added to both the generator and discriminator
Labels condition both models to control the output.
What is the purpose of progressive growing in GANs?
To start with low-resolution images and gradually grow to high resolution
It stabilizes training and allows for high-res output.
In the diffusion model, what happens during the forward process?
Noise is added gradually to the image
Images are incrementally corrupted by noise.
What is the role of CLIP in DALL-E 2?
To encode images and texts into a shared embedding space
CLIP aligns image and text features in the same space.
Which component of Stable Diffusion is responsible for decoding latent embeddings back into images?
Variational Autoencoder (Decoder part)
The decoder reconstructs high-res images from latent space.
What is the purpose of the U-Net in diffusion models?
To predict noise at each timestep
U-Net estimates the noise to help reconstruct clean images.
Which model is commonly used in super-resolution GANs to upscale low-resolution images?
SRGAN
SRGAN learns to generate sharper, high-res images from blurry ones.
In DALL·E, the final image generation step is handled by:
A diffusion model
DALL·E uses a diffusion decoder to render images from embeddings.
Which problems are commonly encountered when training GANs?
A. Mode collapse
B. Overfitting discriminator
C. Perfect convergence
D. Non-convergence
Mode collapse; Overfitting discriminator; Non-convergence
Common GAN issues include poor diversity and unstable training.
In the diffusion process, the reverse process involves:
A. Removing noise gradually
B. Using a U-Net
C. Adding more noise each step
D. Predicting either clean images or noise
Removing noise gradually; Using a U-Net; Predicting either clean images or noise
The reverse process denoises images step by step.
Which techniques use cross-attention?
A. CLIP text-image matching
B. Stable Diffusion conditioning
C. GAN training without labels
D. DALL-E 2 conditioning
Stable Diffusion conditioning; DALL-E 2 conditioning
Cross-attention helps these models align generation with prompts.
Stable Diffusion components include:
A. Autoencoder (VAE)
B. U-Net
C. Transformer Decoder
D. Text Encoder
Autoencoder (VAE); U-Net; Text Encoder
These are the primary components used for latent-space generation.
In GANs, the discriminator is trained to:
A. Generate realistic images
B. Distinguish real from fake images
C. Provide gradients to the generator
D. Upsample noise
Distinguish real from fake images; Provide gradients to the generator
It acts as a teacher helping the generator improve.
Progressive growing of GANs helps by:
A. Speeding up training
B. Stabilizing the generator early
C. Reducing resolution at the end
D. Gradually increasing output resolution
Stabilizing the generator early; Gradually increasing output resolution
It improves training stability and final image quality.
Which components are part of DALL-E 2’s generation pipeline?
A. Prior model
B. Diffusion model (Decoder)
C. LSTM text encoder
D. CLIP encoder
Prior model; Diffusion model (Decoder); CLIP encoder
DALL-E 2 relies on these to link text and image spaces.
Benefits of latent space diffusion (as in Stable Diffusion) include:
A. Higher memory usage
B. Faster generation
C. Lower computation cost
D. High-resolution outputs
Faster generation; Lower computation cost; High-resolution outputs
Latent space diffusion is more efficient than pixel-space.
In Super-Resolution GAN (SRGAN), during training:
A. Low-resolution images are upsampled
B. GAN loss is used
C. Noise is directly added to high-res images
D. Discriminator distinguishes between HR and SR images
Low-resolution images are upsampled; GAN loss is used; Discriminator distinguishes between HR and SR images
GANs are trained on real vs. generated HR images.
In the diffusion model reverse process, each denoising step depends on:
A. Previous denoising result
B. Original input image
C. Text embeddings (if conditioned)
D. Random noise injection
Previous denoising result; Text embeddings (if conditioned)
Each step refines the last output, guided by optional text.