Image Generation Flashcards

Question 1

Q

What is the primary goal of a generator in a GAN?

Answer

A

To generate images that can deceive the discriminator

Question 2

Q

Which architecture is commonly used as the generator in Pix2Pix GAN?

Question 3

Q

In Conditional GAN (CGAN), how is the label information used?

Answer

A

Added to both the generator and discriminator

Question 4

Q

Which problem in GAN training is characterized by the generator producing a limited variety of outputs?

Answer

A

Mode collapse

Question 5

Q

What is the purpose of progressive growing of GANs?

Answer

A

To start with low-resolution images and gradually grow to high resolution

Question 6

Q

In the diffusion model, what happens during the forward process?

Answer

A

Noise is added gradually to the image

Question 7

Q

What is the role of CLIP in DALL-E 2’s architecture?

Answer

A

To encode images and texts into a shared embedding space

Question 8

Q

In Stable Diffusion, which model is responsible for turning compressed latent codes back into images?

Answer

A

Variational Autoencoder (Decoder part)

Question 9

Q

In the training of the prior in DALL-E 2, the model learns to map:

Answer

A

Text embeddings to image embeddings

Question 10

Q

What does cross-attention enable in diffusion models like Stable Diffusion?

Answer

A

Controlling generation based on text prompts

Question 11

Q

Which problems are commonly encountered when training GANs?
A) Mode collapse
B) Overfitting discriminator
C) Perfect convergence
D) Non-convergence

Answer

A

Mode collapse; Overfitting discriminator; Non-convergence

Question 12

Q

In the Diffusion Process, the reverse process involves:
A) Removing noise gradually
B) Using a U-Net
C) Adding more noise each step
D) Predicting either clean images or noise

Answer

A

Removing noise gradually; Using a U-Net; Predicting either clean images or noise

Question 13

Q

Which of the following techniques use cross-attention?
A) CLIP text-image matching
B) Stable Diffusion conditioning
C) GAN training without labels
D) DALL-E 2 conditioning

Answer

A

Stable Diffusion conditioning; DALL-E 2 conditioning

Question 14

Q

Stable Diffusion components include:
A) Autoencoder (VAE)
B) U-Net
C) Transformer Decoder
D) Text Encoder

Answer

A

Autoencoder (VAE); U-Net; Text Encoder

Question 15

Q

In GANs, the discriminator network is trained to:
A) Generate realistic images
B) Distinguish real from fake images
C) Provide gradients to the generator
D) Upsample noise

Answer

A

Distinguish real from fake images; Provide gradients to the generator

Question 16

Q

Progressive growing of GANs helps by:
A) Speeding up training
B) Stabilizing the generator early
C) Reducing resolution at the end
D) Gradually increasing output resolution

Answer

Study These Flashcards

A

Speeding up training; Stabilizing the generator early; Gradually increasing output resolution

Question 17

Q

Which components are part of DALL-E 2’s generation pipeline?
A) Prior model
B) Diffusion model (Decoder)
C) LSTM text encoder
D) CLIP encoder

Answer

Study These Flashcards

A

Prior model; Diffusion model (Decoder); CLIP encoder

Question 18

Q

Benefits of latent space diffusion (as in Stable Diffusion) include:
A) Higher memory usage
B) Faster generation
C) Lower computation cost
D) High-resolution outputs

Answer

Study These Flashcards

A

Faster generation; Lower computation cost; High-resolution outputs

Question 19

Q

In Super-Resolution GAN (SRGAN), during training:
A) Low-resolution images are upsampled
B) GAN loss is used
C) Noise is directly added to high-res images
D) Discriminator distinguishes between HR and SR images

Answer

Study These Flashcards

A

Low-resolution images are upsampled; GAN loss is used; Discriminator distinguishes between HR and SR images

Question 20

Q

In the diffusion model reverse process, each denoising step depends on:
A) Previous denoising result
B) Original input image
C) Text embeddings (if conditioned)
D) Random noise injection

Answer

Study These Flashcards

A

Previous denoising result; Text embeddings (if conditioned)

Image Generation Flashcards

(20 cards)