Self-supervised Learning Flashcards

Question 1

Q

What distinguishes supervised, unsupervised, and self-supervised learning?

Answer

A

Supervised learns p(y|x) with labels; unsupervised learns p(x) structure; self-supervised creates pretext tasks on unlabeled data to learn representations for downstream tasks.

Question 2

Q

What are pretext and downstream tasks?

Answer

A

Pretext tasks use surrogate labels generated from data for representation learning; downstream tasks are real tasks (e.g., classification) using those learned features.

Question 3

Q

Give an example of a self-prediction pretext task.

Answer

A

Image rotation prediction: train to classify which of four rotations has been applied to an input image.

Question 4

Q

What categories of self-prediction tasks are mentioned?

Answer

A

Autoregressive generation (e.g., PixelRNN), masked generation (e.g., denoising autoencoders), and innate relationship prediction (e.g., jigsaw puzzles).

Question 5

Q

What is contrastive learning?

Answer

A

Learning embeddings where similar sample pairs are pulled close and dissimilar pairs pushed apart in feature space.

Question 6

Q

Name two loss functions used in contrastive learning.

Answer

A

InfoNCE and Triplet loss.

Question 7

Q

What is SimCLR?

Answer

A

A contrastive framework using augmented views of the same image and InfoNCE loss to learn visual embeddings.

Question 8

Q

Describe the CLIP method.

Answer

A

Contrastive Language-Image Pre-training aligning image embeddings with corresponding text embeddings trained via contrastive loss.

Question 9

Q

What does Masked Autoencoders (MAE) do as a pretext task?

Answer

A

Randomly masks patches of an image, encodes visible patches, then reconstructs masked ones using a Vision Transformer.

Question 10

Q

Why evaluate self-supervised models on downstream tasks?

Answer

A

Because the quality of learned representations is measured by performance transfer, e.g., training a linear classifier on frozen features.

Question 11

Q

What is MaskCLIP?

Answer

A

A method combining masked autoencoding with CLIP-style multimodal contrastive learning.

Question 12

Q

Why is self-supervised learning beneficial?

Answer

A

It leverages large unlabeled datasets to learn robust representations, improving generalization with few labeled samples.

Self-supervised Learning Flashcards

(12 cards)