Self-supervised Learning Flashcards
(12 cards)
What distinguishes supervised, unsupervised, and self-supervised learning?
Supervised learns p(y|x) with labels; unsupervised learns p(x) structure; self-supervised creates pretext tasks on unlabeled data to learn representations for downstream tasks.
What are pretext and downstream tasks?
Pretext tasks use surrogate labels generated from data for representation learning; downstream tasks are real tasks (e.g., classification) using those learned features.
Give an example of a self-prediction pretext task.
Image rotation prediction: train to classify which of four rotations has been applied to an input image.
What categories of self-prediction tasks are mentioned?
Autoregressive generation (e.g., PixelRNN), masked generation (e.g., denoising autoencoders), and innate relationship prediction (e.g., jigsaw puzzles).
What is contrastive learning?
Learning embeddings where similar sample pairs are pulled close and dissimilar pairs pushed apart in feature space.
Name two loss functions used in contrastive learning.
InfoNCE and Triplet loss.
What is SimCLR?
A contrastive framework using augmented views of the same image and InfoNCE loss to learn visual embeddings.
Describe the CLIP method.
Contrastive Language-Image Pre-training aligning image embeddings with corresponding text embeddings trained via contrastive loss.
What does Masked Autoencoders (MAE) do as a pretext task?
Randomly masks patches of an image, encodes visible patches, then reconstructs masked ones using a Vision Transformer.
Why evaluate self-supervised models on downstream tasks?
Because the quality of learned representations is measured by performance transfer, e.g., training a linear classifier on frozen features.
What is MaskCLIP?
A method combining masked autoencoding with CLIP-style multimodal contrastive learning.
Why is self-supervised learning beneficial?
It leverages large unlabeled datasets to learn robust representations, improving generalization with few labeled samples.