Medical imaging Flashcards
(40 cards)
Which imaging modalities use ionizing radiation and which do not, and why is this distinction clinically important?
Ionizing Radiation Modalities: X-rays (plain radiographs)
CT (Computed Tomography)
Nuclear Medicine scans (e.g., PET, SPECT) because they use radioactive tracers emitting gamma rays
Non-Ionizing Radiation Modalities:
MRI (Magnetic Resonance Imaging): uses magnetic fields and radiofrequency waves.
Ultrasound: uses high-frequency sound waves.
Clinical Importance:
Ionizing radiation can damage biological tissues and DNA if exposure is high or frequent, potentially increasing cancer risk. Therefore, dose minimization is crucial.
Non-ionizing imaging (MRI/Ultrasound) is generally safer regarding radiation concerns but might have other limitations (e.g., cost, availability, scanning time).
Explain how CT scanning reconstructs cross-sectional slices from multiple X-ray projections, and why it provides better soft-tissue contrast than a conventional X-ray.
Reconstruction Principle:
- In CT, the X-ray source and detector rotate 360° around the patient. Multiple 1D projections are acquired at many angles.
- A reconstruction algorithm (often Filtered Back-Projection or modern iterative methods) mathematically reconstructs a 2D slice from these projections.
Better Soft-Tissue Contrast:
- Conventional X-rays compress all tissues into a single 2D projection, so overlapping structures can hide subtle differences.
- CT provides cross-sectional slices, reducing overlap. Tiny differences in attenuation (density) become more apparent in the slice, yielding improved soft-tissue discrimination.
What is the role of radioisotopes in PET (Positron Emission Tomography), and how does it fundamentally differ from MRI in terms of the information it provides?
Role of Radioisotopes in PET:
- A tracer (e.g., FDG, a glucose analog labeled with ^18F) is injected into the patient. Cancer cells or metabolically active tissues take up more tracer.
- When the isotope decays, it emits positrons which annihilate with electrons, producing gamma photons detected by the PET scanner.
Difference from MRI:
- PET: Provides functional or metabolic information (e.g., how actively a region uses glucose).
- MRI: Provides anatomical and some functional details (e.g., T1/T2-weighted contrasts or fMRI for blood oxygen level) but not direct metabolic uptake images. MRI relies on magnetic properties of hydrogen nuclei in tissues.
Define bit depth in digital images and discuss how differences in bit depth (e.g., 8-bit vs. 16-bit) impact possible intensity values.
Bit Depth: The number of bits used to represent each pixel’s intensity.
- 8-bit: 2^8 = 256 discrete intensity levels.
- 16-bit: 2^16 = 65,536 levels.
Impact:
A higher bit depth allows a wider range of intensities and finer gradations (useful in modalities like CT or some microscopy).
8-bit images may risk losing subtle intensity differences, whereas 16-bit retains small gradations but often requires special viewing/processing for display.
What is DICOM, and why is it crucial for the interoperability of medical imaging devices?
DICOM (Digital Imaging and Communications in Medicine):
- A standardized format and network protocol for storing/transmitting medical images.
- Encodes not only pixel data but also metadata (patient ID, study date, modality parameters, etc.).
Importance:
Ensures interoperability across different scanners, PACS (Picture Archiving and Communication Systems), and software.
Allows hospitals and clinics to exchange imaging studies reliably, enabling consistent workflows worldwide.
Explain the Nyquist–Shannon sampling theorem in simple terms, and how violating it causes aliasing artifacts in images.
Sampling Theorem: A signal must be sampled at least at twice its highest frequency component (the Nyquist rate) to capture all information without losing detail.
Violation → Aliasing:
If sampling is too sparse, high-frequency details appear as misleading lower-frequency patterns (e.g., moiré).
In images, small repetitive structures or sharp edges can be incorrectly rendered, creating artifacts or patterns that weren’t actually there.
What is a histogram in image processing, and how do you use it for simple thresholding of an image (e.g., Otsu’s method)?
Histogram: A distribution of the pixel intensities. For grayscale images, the x-axis is intensity and y-axis is the count (or probability) of pixels at each intensity.
Thresholding:
E.g., Otsu’s Method: Automatically finds an intensity threshold that separates foreground and background by maximizing between-class variance (or minimizing within-class variance).
Implementation typically calculates for each candidate threshold how well it separates the histogram into two clusters, and picks the best threshold.
How does linear contrast stretching work, and why do we sometimes apply it to 16-bit images before visualization?
Linear Contrast Stretching:
Maps a range [I_\min, I_\max] to a new dynamic range [0,255 (if 8-bit output).
Formula: 𝐼out=𝛼 (𝐼in−𝐼min)
𝛼=255/𝐼max−𝐼min
Reason for 16-bit → 8-bit:
16-bit images hold many intensity levels, but typical monitors display only 8-bit. A linear stretch ensures the visible range (0–255) best represents the relevant intensities.
This reveals subtle differences otherwise hidden in the extended range of the 16-bit image.
Outline the basic formula for 2D convolution and explain how kernel flipping differs between convolution and cross-correlation.
2D Convolution:
g(x,y)=(f∗h)(x,y)=−k∑kn=−k∑k f(x−m,y−n)h(m,n)
Kernel Flipping:
True mathematical convolution flips the kernel in both x and y directions (i.e., h(−m,−n)).
Cross-correlation omits the flip, effectively using h(m,n) as is.
Many image processing libraries do cross-correlation by default because it’s simpler for template matching.
Compare the box (mean) filter with a Gaussian filter in terms of smoothing performance and side effects on edges.
Box (Mean) Filter:
Simple average of the neighborhood.
Strongly blurs edges (noisy corners are significantly smoothed).
Can introduce block-like artifacts because each pixel is equally weighted in the local window.
Gaussian Filter:
Weights pixels according to a Gaussian distribution (center > edges of the window).
More natural smoothing, less likely to create abrupt artifacts.
Preserves edges slightly better than box filter because it places more emphasis on central pixels.
Also used in many scale-space techniques (LoG, DoG, etc.).
What is the Sobel operator, and how do its horizontal and vertical kernels detect edge directions?
Answer:
Sobel Operator:
A pair of 3×3 filters used to approximate the intensity gradient in horizontal (x) and vertical (y) directions.
Typically:
𝐺𝑥=[−1 0 1; -2 0 2; -1 0 1]
𝐺𝑦=[−1−2−1; 0 0 0; 1 2 1]
Edge Direction Detection:
Convolving with G x captures changes along x (vertical edges).
Convolving with G y captures changes along y (horizontal edges).
Gradient magnitude indicates edge strength, and the arctan(𝐺𝑦/𝐺𝑥) indicates direction.
Describe how the Laplacian-of-Gaussian filter (LoG) is used for blob detection and why scale selection is crucial.
LoG Filter:
∇2(𝐺𝜎∗𝐼): a second derivative-based operator on a Gaussian-smoothed image.
Detects regions where intensity changes from bright to dark or vice versa in a “blob”-like manner.
Blob Detection:
A circular (or elliptical) region can be identified if LoG response is high and changes sign near the center.
By examining multiple σ values, one can detect blobs of different sizes (multiscale analysis).
Scale Selection:
Real-world objects come in different sizes. A single σ might only detect certain sized blobs.
Using a range of scales ensures capturing small, medium, or large structures.
Briefly describe the four main steps of SIFT (Scale-Space Extrema, Keypoint Localization, Orientation Assignment, Descriptor Formation).
Scale-Space Extrema:
Build a Gaussian pyramid and compute Difference of Gaussians (DoG) at multiple scales.
Identify local maxima/minima in a 3D neighborhood (x, y, scale).
Keypoint Localization:
Refine each candidate’s location and scale.
Reject low-contrast points or points on edges (via a Hessian-based check).
Orientation Assignment:
Compute local gradient orientations in a region around the keypoint.
Assign a dominant orientation (or multiple if they are within 80% of the peak).
Descriptor Formation:
For a 16×16 window around the keypoint (rotated to the assigned orientation), form 4×4 subregions.
Each subregion accumulates an 8-bin gradient histogram → 4×4×8 = 128D descriptor vector.
Why is Difference of Gaussians (DoG) used as an approximation to the Laplacian in SIFT, and how does it help identify scale-invariant keypoints?
DoG Approximation:
The Laplacian-of-Gaussian can be computed by subtracting two Gaussians at nearby scales (Gauss(σ) - Gauss(κσ)).
This is computationally more efficient and stable than directly convolving with LoG.
Scale Invariance:
By building a pyramid of images blurred at different scales, local maxima in DoG indicate potential blobs (keypoints).
The scale at which the response is maximal corresponds to the intrinsic scale of the feature, making detection invariant to image size changes.
In the context of SIFT, what do we mean by rotation invariance, and how is it achieved in practice?
Rotation Invariance:
Keypoints should be matched even if the object appears rotated in another image.
Implementation:
Compute local gradients in the neighborhood of the keypoint.
Determine the dominant orientation by finding the peak in the gradient orientation histogram.
“Rotate” the coordinate frame of the descriptor to align with this orientation.
The final descriptor is effectively anchored to that orientation, making it rotation invariant.
List the main types of geometric transformations (rigid, affine, non-rigid) and describe a scenario where each is most appropriate.
Rigid (translation + rotation, sometimes 3D includes rotation about 3 axes):
Appropriate when the object or anatomy doesn’t deform (e.g., registration of brain images over short times, or registering an object with no shape change).
Affine (includes scaling, shear in addition to rigid):
Useful for images with slight scaling or shear differences (e.g., comparing scans from devices with slightly different pixel spacing or magnification).
Non-rigid / Deformable:
Accounts for local tissue warping or motion (e.g., matching a follow-up MRI with organ shape changes over time, or motion in dynamic imaging of organs that physically move).
What is mutual information (MI) in image registration, and why is it commonly used for multimodal registration (e.g., MRI to CT)?
Mutual Information:
A measure of how much knowing one variable (intensity in MRI) reduces uncertainty about the other (intensity in CT).
Mathematically derived from entropy:
MI(A,B)=H(A)+H(B)−H(A,B).
Usage in Multimodal:
Different modalities yield different intensity distributions for the same tissue. So simpler metrics (like sum of squared differences) don’t always work.
MI is higher when the images are well aligned, because the intensity relationship becomes more predictable.
How do fiducial markers facilitate image registration, and what is the difference between intrinsic and extrinsic fiducials?
Fiducial Markers:
Points or objects visible in imaging that serve as known correspondences between images. They reduce the search space for the transform parameters.
Once you locate them in both images, you can solve for the best transformation aligning the markers.
Intrinsic vs. Extrinsic:
Intrinsic: Naturally occurring landmarks in the anatomy (e.g., corners of the ventricles in the brain).
Extrinsic: Artificial markers placed on or in the patient (e.g., skin markers, bone-implanted markers).
Explain Otsu’s thresholding method: how does it partition the image histogram, and what criterion does it optimize?
Otsu’s Method:
Evaluates every possible threshold t.
Splits the histogram into two classes: {0,…,t} and {t+1,…,L−1}.
For each t, compute within-class variance (or equivalently between-class variance).
Picks the threshold t ∗ that maximizes the separation (i.e., largest between-class variance or smallest within-class variance).
Essentially, it finds a threshold that best separates background and foreground in a bimodal histogram.
Compare k-means segmentation with Gaussian Mixture Model segmentation in terms of assumptions and flexibility.
K-means:
Assumes each cluster is roughly spherical in feature space and uses Euclidean distance.
Hard assignments: each pixel belongs entirely to one cluster.
Typically simpler and faster but less flexible if data is not well-separated in spherical clusters.
Gaussian Mixture Model (GMM):
Each cluster is modeled as a Gaussian distribution with mean and covariance (can have elliptical shapes).
Soft assignments: a pixel has a probability of belonging to each cluster.
More computationally expensive (EM algorithm) but can better represent complex data distributions.
Describe a typical watershed algorithm for segmentation: how is the topographic analogy used, and what are potential pitfalls like over-segmentation?
Watershed:
Interpret intensity as elevation.
“Water” floods from low-intensity basins upward. The boundaries (watershed lines) form the segmentation.
Each local minimum forms a catchment basin.
Topographic Analogy:
If you pour water at each local minimum, the water will fill up until it meets water rising from another basin, forming the watershed boundary.
Pitfalls:
Over-segmentation if there are many shallow minima.
Markers or pre-processing (distance transforms, morphological filters) can mitigate that by controlling which minima are relevant.
What is the motivation for using superpixels, and how does SLIC (Simple Linear Iterative Clustering) generate superpixels that respect image boundaries?
Motivation:
Instead of dealing with millions of individual pixels, group them into a few thousand “superpixels” that adhere to object boundaries.
Reduces complexity for subsequent steps (segmentation, classification).
SLIC:
Initializes cluster centers on a grid, spacing =sqrt(N/k) where k is desired number of superpixels.
Performs iterative local k-means in a 5D space (L,a,b,x,y) for color + coordinates.
Enforces a limited search region (2S×2S around each center), making it computationally efficient.
Final superpixels usually align well with boundaries due to the combined color + position distance measure.
Explain why SLIC only needs to do distance computations within a 2S×2S region around the superpixel center, rather than the entire image
Each superpixel is expected to have a roughly square shape of size S×S.
Because of how the cluster centers are laid out, a pixel far outside that region is extremely unlikely to belong to that superpixel.
This local search drastically reduces computations from a naive approach where each pixel is compared to all cluster centers.
After computing superpixels, how can we merge or classify them into larger object segments?
Merging/Classification Approaches:
Region Merging: Evaluate adjacency of superpixels and merge if they are sufficiently similar in color or texture.
Graph-Based: Treat superpixels as nodes in a graph, edges have a similarity measure—run a higher-level segmentation method like region adjacency or graph cut.
Classifier: Extract features (e.g., mean intensity, texture) from each superpixel and train a supervised classifier to label them (foreground vs. background, or multiple classes).