week 2 Flashcards by lilli jankovich

what is u net

a deep learning model for biomedical image segmentation that labels every pixel

How well did you know this?

Not at all

Perfectly

what does u net specialize in

Precise pixel-wise classification in medical images.

How well did you know this?

Not at all

Perfectly

What are the two paths in U-Net’s architecture?

Contracting path (encoder) and expanding path (decoder).

How well did you know this?

Not at all

Perfectly

What does the contracting path in U-Net do?

The “shrinking” side of U-Net that finds patterns (edges, shapes) by making the image smaller and more focused.

How well did you know this?

Not at all

Perfectly

What does the expanding path in U-Net do?

zooming in

The “growing” side that brings the image back to normal size and labels each pixel using the info it learned.

How well did you know this?

Not at all

Perfectly

What is the purpose of skip connections in U-Net?

These are shortcuts that help the decoder remember small details from the original image. Keeps things sharp.

How well did you know this?

Not at all

Perfectly

pixel wise classification

Instead of saying “this is a dog,” U-Net says “this pixel is part of a dog, this pixel is background,” etc.

How well did you know this?

Not at all

Perfectly

Why is U-Net good for?

small data sets

How well did you know this?

Not at all

Perfectly

data augmentation

Making more training images by flipping, rotating, or stretching existing ones — helpful when we don’t have a lot of real data.

How well did you know this?

Not at all

Perfectly

max pooling

A way of making the image smaller by keeping only the most important parts — like zooming out to see big patterns.

How well did you know this?

Not at all

Perfectly

Transposed Convolutions (Upsampling_

Makes the image bigger again after shrinking — used in the decoder to go back to original size.

How well did you know this?

Not at all

Perfectly

Cross-Entropy Loss

A way to measure how wrong the AI’s pixel guesses are — helps it learn and improve.

How well did you know this?

Not at all

Perfectly

SGD (Stochastic Gradient Descent)

The learning method U-Net uses — it learns step by step by checking mistakes and adjusting.

How well did you know this?

Not at all

Perfectly

Overlap-Tile Strategy

U-Net cuts up big images into tiles and looks at each one — then stitches them back together. Great for large medical scans.

How well did you know this?

Not at all

Perfectly

IoU (Intersection over Union)

A score that shows how well U-Net’s predictions match the real image. Higher = better.

How well did you know this?

Not at all

Perfectly

What U-Net Is Best At

It’s really good at medical images, especially when you don’t have a lot of labeled data.

How well did you know this?

Not at all

Perfectly

CLIP

An AI that connects images and text — it can guess what’s in a picture just from a sentence like “a photo of a cat.”

How well did you know this?

Not at all

Perfectly

contrastive learning

Study These Flashcards

A way of teaching AI by showing it the right image–text pairs and telling it which ones don’t match.

zero shot learning

Study These Flashcards

CLIP can understand new tasks without extra training — just by reading your description.

Natural Language Supervision

Study These Flashcards

CLIP learns from real language (captions from the internet), not from labeled datasets like “cat = 1”

Image Encoder

Study These Flashcards

A part of CLIP that looks at pictures and turns them into numbers (embeddings) for the model to understand.

Text Encoder

Study These Flashcards

A part of CLIP that reads text and also turns it into numbers so it can be matched with the picture

Shared Embedding Space

Study These Flashcards

Both image and text are turned into numbers in the same space, so CLIP can match them up easily.

Pretraining

Study These Flashcards

CLIP is trained on 400 million image + text pairs from the internet to learn how pictures and language go together.

Vision Transformer (ViT)

A powerful type of model that CLIP can use to look at pictures — better than older models for many tasks.

Zero-Shot Classification

Example: You show CLIP a picture and say “Is this a dog, cat, or chair?” — it figures it out without being trained on that list.

Why CLIP Is Useful

It works well on lots of tasks like reading signs, identifying objects, and even labeling images — without retraining.

Where CLIP Struggles

Doesn’t do great on very specific tasks like X-rays or medical images — not trained deeply on those.

Why CLIP Matters

It shows that AI can learn about the world using just language + images, like we do.

What is the primary purpose of the U-Net architecture? A) Object detection in natural images B) Pixel-wise classification for biomedical image segmentation C) Text-to-image generation D) Speech recognition in noisy environments

How does U-Net handle the challenge of limited labeled biomedical data? A) Using transfer learning from ImageNet B) Incorporating a generative adversarial network (GAN) C) Applying extensive data augmentation techniques D) Utilizing pre-trained transformers

Which of the following is a key innovation introduced by U-Net? A) Fully connected layers in segmentation B) Attention mechanism for pixel grouping C) Skip connections between encoder and decoder D) Fixed input size with no padding

What is the role of the contracting path (encoder) in U-Net? A) To classify each pixel into a specific object class B) To synthesize images from noise C) To extract and downsample features from the input image D) To compute the loss function

How does the expanding path (decoder) work in U-Net? A) Reduces feature dimensions through convolution B) Uses transposed convolutions to upsample and combines them with encoder features C) Flattens the feature maps for classification D) Ignores previous feature maps to avoid overfitting

Why are skip connections important in U-Net? A) They increase training time B) They improve numerical stability C) They help preserve spatial information lost during downsampling D) They reduce memory usage

What kind of loss function is used in U-Net for pixel classification? A) Mean Squared Error B) Hinge Loss C) Pixel-wise softmax followed by cross-entropy loss D) Kullback–Leibler Divergence

What strategy does U-Net use to segment large images efficiently?

In the experiments, what metric did U-Net achieve the lowest value for in the EM segmentation challenge?

Which statement best summarizes the contribution of U-Net? A) It provides a new loss function for generative models. B) It introduces reinforcement learning for segmentation tasks. C) It enables accurate and fast biomedical image segmentation using a novel architecture and augmentation. D) It is a supervised language model for image captioning.

What is the core idea behind CLIP’s training strategy? A) Classify pixels using convolutional layers B) Predict object bounding boxes from text C) Learn visual representations by matching images with their correct text captions D) Generate images from natural language prompts

What type of architecture does CLIP use to process text data? A) LSTM-based decoder B) Convolutional network C) Transformer-based text encoder D) Recurrent neural network (RNN)

How does CLIP perform zero-shot classification? A) By fine-tuning on the new dataset B) By predicting pixel-level labels for each class C) By comparing image embeddings with text embeddings of class descriptions D) By retrieving nearest neighbors from a training set

What is the main benefit of CLIP's shared embedding space? A) Enables model compression for deployment B) Allows direct comparison of images and texts for similarity C) Supports training without gradient descent D) Reduces the number of parameters in the model

Which of the following is a limitation of CLIP mentioned in the paper? A) Inability to learn from text B) Poor performance on OCR and video classification tasks C) Inferior performance on specialized domains like medical imaging D) Requires millions of human-annotated labels

week 2 Flashcards

(44 cards)