Self-supervised Learning Flashcards

(12 cards)

1
Q

What distinguishes supervised, unsupervised, and self-supervised learning?

A

Supervised learns p(y|x) with labels; unsupervised learns p(x) structure; self-supervised creates pretext tasks on unlabeled data to learn representations for downstream tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are pretext and downstream tasks?

A

Pretext tasks use surrogate labels generated from data for representation learning; downstream tasks are real tasks (e.g., classification) using those learned features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Give an example of a self-prediction pretext task.

A

Image rotation prediction: train to classify which of four rotations has been applied to an input image.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What categories of self-prediction tasks are mentioned?

A

Autoregressive generation (e.g., PixelRNN), masked generation (e.g., denoising autoencoders), and innate relationship prediction (e.g., jigsaw puzzles).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is contrastive learning?

A

Learning embeddings where similar sample pairs are pulled close and dissimilar pairs pushed apart in feature space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Name two loss functions used in contrastive learning.

A

InfoNCE and Triplet loss.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is SimCLR?

A

A contrastive framework using augmented views of the same image and InfoNCE loss to learn visual embeddings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the CLIP method.

A

Contrastive Language-Image Pre-training aligning image embeddings with corresponding text embeddings trained via contrastive loss.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does Masked Autoencoders (MAE) do as a pretext task?

A

Randomly masks patches of an image, encodes visible patches, then reconstructs masked ones using a Vision Transformer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why evaluate self-supervised models on downstream tasks?

A

Because the quality of learned representations is measured by performance transfer, e.g., training a linear classifier on frozen features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is MaskCLIP?

A

A method combining masked autoencoding with CLIP-style multimodal contrastive learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is self-supervised learning beneficial?

A

It leverages large unlabeled datasets to learn robust representations, improving generalization with few labeled samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly