week 2 Flashcards
(44 cards)
what is u net
a deep learning model for biomedical image segmentation that labels every pixel
what does u net specialize in
Precise pixel-wise classification in medical images.
What are the two paths in U-Net’s architecture?
Contracting path (encoder) and expanding path (decoder).
What does the contracting path in U-Net do?
The “shrinking” side of U-Net that finds patterns (edges, shapes) by making the image smaller and more focused.
What does the expanding path in U-Net do?
zooming in
The “growing” side that brings the image back to normal size and labels each pixel using the info it learned.
What is the purpose of skip connections in U-Net?
These are shortcuts that help the decoder remember small details from the original image. Keeps things sharp.
pixel wise classification
Instead of saying “this is a dog,” U-Net says “this pixel is part of a dog, this pixel is background,” etc.
Why is U-Net good for?
small data sets
data augmentation
Making more training images by flipping, rotating, or stretching existing ones — helpful when we don’t have a lot of real data.
max pooling
A way of making the image smaller by keeping only the most important parts — like zooming out to see big patterns.
Transposed Convolutions (Upsampling_
Makes the image bigger again after shrinking — used in the decoder to go back to original size.
Cross-Entropy Loss
A way to measure how wrong the AI’s pixel guesses are — helps it learn and improve.
SGD (Stochastic Gradient Descent)
The learning method U-Net uses — it learns step by step by checking mistakes and adjusting.
Overlap-Tile Strategy
U-Net cuts up big images into tiles and looks at each one — then stitches them back together. Great for large medical scans.
IoU (Intersection over Union)
A score that shows how well U-Net’s predictions match the real image. Higher = better.
What U-Net Is Best At
It’s really good at medical images, especially when you don’t have a lot of labeled data.
CLIP
An AI that connects images and text — it can guess what’s in a picture just from a sentence like “a photo of a cat.”
contrastive learning
A way of teaching AI by showing it the right image–text pairs and telling it which ones don’t match.
zero shot learning
CLIP can understand new tasks without extra training — just by reading your description.
Natural Language Supervision
CLIP learns from real language (captions from the internet), not from labeled datasets like “cat = 1”
Image Encoder
A part of CLIP that looks at pictures and turns them into numbers (embeddings) for the model to understand.
Text Encoder
A part of CLIP that reads text and also turns it into numbers so it can be matched with the picture
Shared Embedding Space
Both image and text are turned into numbers in the same space, so CLIP can match them up easily.
Pretraining
CLIP is trained on 400 million image + text pairs from the internet to learn how pictures and language go together.