Image Generation Explanations Flashcards

(20 cards)

1
Q

What is the primary goal of a generator in a GAN?

A

To generate images that can deceive the discriminator
The generator tries to fool the discriminator into thinking fake images are real.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which architecture is commonly used as the generator in Pix2Pix GAN?

A

U-Net
U-Net preserves spatial structure through skip connections.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In Conditional GAN (CGAN), how is the label information used?

A

Added to both the generator and discriminator
Labels condition both models to control the output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the purpose of progressive growing in GANs?

A

To start with low-resolution images and gradually grow to high resolution
It stabilizes training and allows for high-res output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In the diffusion model, what happens during the forward process?

A

Noise is added gradually to the image
Images are incrementally corrupted by noise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the role of CLIP in DALL-E 2?

A

To encode images and texts into a shared embedding space
CLIP aligns image and text features in the same space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which component of Stable Diffusion is responsible for decoding latent embeddings back into images?

A

Variational Autoencoder (Decoder part)
The decoder reconstructs high-res images from latent space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the purpose of the U-Net in diffusion models?

A

To predict noise at each timestep
U-Net estimates the noise to help reconstruct clean images.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which model is commonly used in super-resolution GANs to upscale low-resolution images?

A

SRGAN
SRGAN learns to generate sharper, high-res images from blurry ones.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In DALL·E, the final image generation step is handled by:

A

A diffusion model
DALL·E uses a diffusion decoder to render images from embeddings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which problems are commonly encountered when training GANs?
A. Mode collapse
B. Overfitting discriminator
C. Perfect convergence
D. Non-convergence

A

Mode collapse; Overfitting discriminator; Non-convergence
Common GAN issues include poor diversity and unstable training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

In the diffusion process, the reverse process involves:
A. Removing noise gradually
B. Using a U-Net
C. Adding more noise each step
D. Predicting either clean images or noise

A

Removing noise gradually; Using a U-Net; Predicting either clean images or noise
The reverse process denoises images step by step.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which techniques use cross-attention?
A. CLIP text-image matching
B. Stable Diffusion conditioning
C. GAN training without labels
D. DALL-E 2 conditioning

A

Stable Diffusion conditioning; DALL-E 2 conditioning
Cross-attention helps these models align generation with prompts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Stable Diffusion components include:
A. Autoencoder (VAE)
B. U-Net
C. Transformer Decoder
D. Text Encoder

A

Autoencoder (VAE); U-Net; Text Encoder
These are the primary components used for latent-space generation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In GANs, the discriminator is trained to:
A. Generate realistic images
B. Distinguish real from fake images
C. Provide gradients to the generator
D. Upsample noise

A

Distinguish real from fake images; Provide gradients to the generator
It acts as a teacher helping the generator improve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Progressive growing of GANs helps by:
A. Speeding up training
B. Stabilizing the generator early
C. Reducing resolution at the end
D. Gradually increasing output resolution

A

Stabilizing the generator early; Gradually increasing output resolution
It improves training stability and final image quality.

17
Q

Which components are part of DALL-E 2’s generation pipeline?
A. Prior model
B. Diffusion model (Decoder)
C. LSTM text encoder
D. CLIP encoder

A

Prior model; Diffusion model (Decoder); CLIP encoder
DALL-E 2 relies on these to link text and image spaces.

18
Q

Benefits of latent space diffusion (as in Stable Diffusion) include:
A. Higher memory usage
B. Faster generation
C. Lower computation cost
D. High-resolution outputs

A

Faster generation; Lower computation cost; High-resolution outputs
Latent space diffusion is more efficient than pixel-space.

19
Q

In Super-Resolution GAN (SRGAN), during training:
A. Low-resolution images are upsampled
B. GAN loss is used
C. Noise is directly added to high-res images
D. Discriminator distinguishes between HR and SR images

A

Low-resolution images are upsampled; GAN loss is used; Discriminator distinguishes between HR and SR images
GANs are trained on real vs. generated HR images.

20
Q

In the diffusion model reverse process, each denoising step depends on:
A. Previous denoising result
B. Original input image
C. Text embeddings (if conditioned)
D. Random noise injection

A

Previous denoising result; Text embeddings (if conditioned)
Each step refines the last output, guided by optional text.