Image Generation Flashcards
(20 cards)
What is the primary goal of a generator in a GAN?
To generate images that can deceive the discriminator
Which architecture is commonly used as the generator in Pix2Pix GAN?
U-Net
In Conditional GAN (CGAN), how is the label information used?
Added to both the generator and discriminator
Which problem in GAN training is characterized by the generator producing a limited variety of outputs?
Mode collapse
What is the purpose of progressive growing of GANs?
To start with low-resolution images and gradually grow to high resolution
In the diffusion model, what happens during the forward process?
Noise is added gradually to the image
What is the role of CLIP in DALL-E 2’s architecture?
To encode images and texts into a shared embedding space
In Stable Diffusion, which model is responsible for turning compressed latent codes back into images?
Variational Autoencoder (Decoder part)
In the training of the prior in DALL-E 2, the model learns to map:
Text embeddings to image embeddings
What does cross-attention enable in diffusion models like Stable Diffusion?
Controlling generation based on text prompts
Which problems are commonly encountered when training GANs?
A) Mode collapse
B) Overfitting discriminator
C) Perfect convergence
D) Non-convergence
Mode collapse; Overfitting discriminator; Non-convergence
In the Diffusion Process, the reverse process involves:
A) Removing noise gradually
B) Using a U-Net
C) Adding more noise each step
D) Predicting either clean images or noise
Removing noise gradually; Using a U-Net; Predicting either clean images or noise
Which of the following techniques use cross-attention?
A) CLIP text-image matching
B) Stable Diffusion conditioning
C) GAN training without labels
D) DALL-E 2 conditioning
Stable Diffusion conditioning; DALL-E 2 conditioning
Stable Diffusion components include:
A) Autoencoder (VAE)
B) U-Net
C) Transformer Decoder
D) Text Encoder
Autoencoder (VAE); U-Net; Text Encoder
In GANs, the discriminator network is trained to:
A) Generate realistic images
B) Distinguish real from fake images
C) Provide gradients to the generator
D) Upsample noise
Distinguish real from fake images; Provide gradients to the generator
Progressive growing of GANs helps by:
A) Speeding up training
B) Stabilizing the generator early
C) Reducing resolution at the end
D) Gradually increasing output resolution
Speeding up training; Stabilizing the generator early; Gradually increasing output resolution
Which components are part of DALL-E 2’s generation pipeline?
A) Prior model
B) Diffusion model (Decoder)
C) LSTM text encoder
D) CLIP encoder
Prior model; Diffusion model (Decoder); CLIP encoder
Benefits of latent space diffusion (as in Stable Diffusion) include:
A) Higher memory usage
B) Faster generation
C) Lower computation cost
D) High-resolution outputs
Faster generation; Lower computation cost; High-resolution outputs
In Super-Resolution GAN (SRGAN), during training:
A) Low-resolution images are upsampled
B) GAN loss is used
C) Noise is directly added to high-res images
D) Discriminator distinguishes between HR and SR images
Low-resolution images are upsampled; GAN loss is used; Discriminator distinguishes between HR and SR images
In the diffusion model reverse process, each denoising step depends on:
A) Previous denoising result
B) Original input image
C) Text embeddings (if conditioned)
D) Random noise injection
Previous denoising result; Text embeddings (if conditioned)