week 2 Flashcards

(44 cards)

1
Q

what is u net

A

a deep learning model for biomedical image segmentation that labels every pixel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what does u net specialize in

A

Precise pixel-wise classification in medical images.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two paths in U-Net’s architecture?

A

Contracting path (encoder) and expanding path (decoder).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the contracting path in U-Net do?

A

The “shrinking” side of U-Net that finds patterns (edges, shapes) by making the image smaller and more focused.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the expanding path in U-Net do?

A

zooming in

The “growing” side that brings the image back to normal size and labels each pixel using the info it learned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the purpose of skip connections in U-Net?

A

These are shortcuts that help the decoder remember small details from the original image. Keeps things sharp.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

pixel wise classification

A

Instead of saying “this is a dog,” U-Net says “this pixel is part of a dog, this pixel is background,” etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is U-Net good for?

A

small data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

data augmentation

A

Making more training images by flipping, rotating, or stretching existing ones — helpful when we don’t have a lot of real data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

max pooling

A

A way of making the image smaller by keeping only the most important parts — like zooming out to see big patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Transposed Convolutions (Upsampling_

A

Makes the image bigger again after shrinking — used in the decoder to go back to original size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Cross-Entropy Loss

A

A way to measure how wrong the AI’s pixel guesses are — helps it learn and improve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

SGD (Stochastic Gradient Descent)

A

The learning method U-Net uses — it learns step by step by checking mistakes and adjusting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Overlap-Tile Strategy

A

U-Net cuts up big images into tiles and looks at each one — then stitches them back together. Great for large medical scans.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

IoU (Intersection over Union)

A

A score that shows how well U-Net’s predictions match the real image. Higher = better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What U-Net Is Best At

A

It’s really good at medical images, especially when you don’t have a lot of labeled data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

CLIP

A

An AI that connects images and text — it can guess what’s in a picture just from a sentence like “a photo of a cat.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

contrastive learning

A

A way of teaching AI by showing it the right image–text pairs and telling it which ones don’t match.

19
Q

zero shot learning

A

CLIP can understand new tasks without extra training — just by reading your description.

20
Q

Natural Language Supervision

A

CLIP learns from real language (captions from the internet), not from labeled datasets like “cat = 1”

21
Q

Image Encoder

A

A part of CLIP that looks at pictures and turns them into numbers (embeddings) for the model to understand.

22
Q

Text Encoder

A

A part of CLIP that reads text and also turns it into numbers so it can be matched with the picture

23
Q

Shared Embedding Space

A

Both image and text are turned into numbers in the same space, so CLIP can match them up easily.

24
Q

Pretraining

A

CLIP is trained on 400 million image + text pairs from the internet to learn how pictures and language go together.

25
Vision Transformer (ViT)
A powerful type of model that CLIP can use to look at pictures — better than older models for many tasks.
26
Zero-Shot Classification
Example: You show CLIP a picture and say “Is this a dog, cat, or chair?” — it figures it out without being trained on that list.
27
Why CLIP Is Useful
It works well on lots of tasks like reading signs, identifying objects, and even labeling images — without retraining.
28
Where CLIP Struggles
Doesn’t do great on very specific tasks like X-rays or medical images — not trained deeply on those.
29
Why CLIP Matters
It shows that AI can learn about the world using just language + images, like we do.
30
What is the primary purpose of the U-Net architecture? A) Object detection in natural images B) Pixel-wise classification for biomedical image segmentation C) Text-to-image generation D) Speech recognition in noisy environments
B
31
How does U-Net handle the challenge of limited labeled biomedical data? A) Using transfer learning from ImageNet B) Incorporating a generative adversarial network (GAN) C) Applying extensive data augmentation techniques D) Utilizing pre-trained transformers
C
32
Which of the following is a key innovation introduced by U-Net? A) Fully connected layers in segmentation B) Attention mechanism for pixel grouping C) Skip connections between encoder and decoder D) Fixed input size with no padding
C
33
What is the role of the contracting path (encoder) in U-Net? A) To classify each pixel into a specific object class B) To synthesize images from noise C) To extract and downsample features from the input image D) To compute the loss function
C
34
How does the expanding path (decoder) work in U-Net? A) Reduces feature dimensions through convolution B) Uses transposed convolutions to upsample and combines them with encoder features C) Flattens the feature maps for classification D) Ignores previous feature maps to avoid overfitting
a
35
Why are skip connections important in U-Net? A) They increase training time B) They improve numerical stability C) They help preserve spatial information lost during downsampling D) They reduce memory usage
C
36
What kind of loss function is used in U-Net for pixel classification? A) Mean Squared Error B) Hinge Loss C) Pixel-wise softmax followed by cross-entropy loss D) Kullback–Leibler Divergence
c
37
What strategy does U-Net use to segment large images efficiently?
c
38
In the experiments, what metric did U-Net achieve the lowest value for in the EM segmentation challenge?
b
39
Which statement best summarizes the contribution of U-Net? A) It provides a new loss function for generative models. B) It introduces reinforcement learning for segmentation tasks. C) It enables accurate and fast biomedical image segmentation using a novel architecture and augmentation. D) It is a supervised language model for image captioning.
c
40
What is the core idea behind CLIP’s training strategy? A) Classify pixels using convolutional layers B) Predict object bounding boxes from text C) Learn visual representations by matching images with their correct text captions D) Generate images from natural language prompts
c
41
What type of architecture does CLIP use to process text data? A) LSTM-based decoder B) Convolutional network C) Transformer-based text encoder D) Recurrent neural network (RNN)
c
42
How does CLIP perform zero-shot classification? A) By fine-tuning on the new dataset B) By predicting pixel-level labels for each class C) By comparing image embeddings with text embeddings of class descriptions D) By retrieving nearest neighbors from a training set
c
43
What is the main benefit of CLIP's shared embedding space? A) Enables model compression for deployment B) Allows direct comparison of images and texts for similarity C) Supports training without gradient descent D) Reduces the number of parameters in the model
b
44
Which of the following is a limitation of CLIP mentioned in the paper? A) Inability to learn from text B) Poor performance on OCR and video classification tasks C) Inferior performance on specialized domains like medical imaging D) Requires millions of human-annotated labels
c