Low Level Vision Flashcards
(68 cards)
How do we get a computer to recognise images?
What is the range of a 3 bit image?
What is the range of an 8 bit image?
Each pixel is represented by a value. This is the colour of the pixel.
For 3 bits, this is 8 colours with a range [0-7]
For 8 bits, this is 256 colours with a range [0-255]
What is noise in regards to images?
Why does it make computer vision more difficult?
How can we get a clear image?
Noise refers to random variations in pixel values that don’t correspond to the true scene being captured.
This arises due to limitations in the imaging sensor, environmental conditions, or errors in data transmission.
Noise makes computer vision tasks more challenging because it obscures the true features of an image.
- it can distort features or alters their appearance
There are methods for de-noising and algorithms that are built to deal with noise.
What are some factors that can make computer vision difficult?
- Different viewpoints greatly vary an objects appearance (orientation, rotation, retinal location, scale)
- Noise
- Illumination
- Deformations
- Small size / Far away objects
- Occlusion: if part of the object is cut off in the image, hard to label
- Truncated object: if one object is obstructing another, hard to label
What is the difference between image recognition and object detection?
- Image recognition is when an image is matched with a label of what object is featured in the image
- Object detection is when a bounding box with a label is placed around the object within the image
What is edge detection?
Given an image, detect the edge of an object or several objects.
What helps us to identify the edge of an object?
The edge of an object usually has different features to the other regions in the image, for example, different colours or thicknesses which help to differentiate edges.
What is semantic segmentation?
Given an image, it outputs segments with labels at a pixel level. So for each pixel we want to say which category it belongs to.
What is instance segmentation?
Given an image, it outputs segments with labels at a pixel level, but for every instance, so if there’s multiple cows in an image, there is a category for each separate cow.
What is the difference between semantic and instance segmentation?
Semantic segmentation groups objects of the same type into the same category, where as instance segmentation separates each object instance into it’s own category/segment.
What is image retrieval
retrieving relevant images from a large dataset based on a query input
What is image generation?
generating images using image retrieval, image recognition and object segmentation techniques
Describe how a pinhole camera works?
- it simulates the function of our eyes: light comes through our pupils from different angles and hits the back of our eye (image plane). The position where it hit the image plane, tells our brain where the object is (distance away from us, position left/right).
- A pinhole camera allows the light to enter through a pinhole and the light hits the plane at the back. The coordinates on this plane translate to pixels in a 2D image.
- The 2D image is inverted, size is reduced and there’s no depth information
What are the 4 stages of coordinates?
1) Real world coordinates
2) Camera coordinates
3) Image coordinates
4) Pixel coordinates
Why do we want to use homogeneous coordinates?
Using homogeneous coordinates, we can easily convert many transformations into the form of matrix multiplication.
How do we translate cartesian coordinates to homogeneous coordinates?
We add another dimension and give it the value 1:
(x, y) becomes (x, y, 1)
(x, y, z) becomes (x, y, z, 1)
(4, 3) becomes (4, 3, 1)
How do we translate homogeneous coordinates to cartesian coordinates?
We divide by the last dimension:
(x, y, z, 1) becomes (x/1, y/1, z/1) = (x, y, z)
(x, y, z, w) becomes (x/w, y/w, z/w)
(3, 4, 6, 2) becomes (3/2, 2, 3)
What are the 3 steps for translating real world coordinates into pixel coordinates?
1) translate real world coordinates into camera coordinates
2) translate camera coordinates into image coordinates
3) translate image coordinates into pixel coordinates
What steps are performed to translate real world coordinates into camera coordinates
1) turn real world coordinates into homogeneous coordinates:
(80, 25, 510) becomes (80, 25, 510, 1)
2) Perform matrix multiplication with camera extrinsic matrix (usually identity matrix)
What is a Camera Extrinsic matrix?
It describes the position of the camera’s position and orientation in the image
- In exams it’s usually specified as an identity matrix, but if it isn’t, it will be specified.
What are the usual things we need to calculate to turn camera coordinates into image coordinates and how is the simplified in an exam question?
The focal distance f is the distance between the pinhole and the image plane. Using similar triangles, f helps us calculate the real world point x, has been inverted/reduced to on the image plane. The exam question gives us f and we use matrix multiplication (translating the similar triangles equations into matrices) with f to translate the camera coordinate into the image coordinate
What steps are performed to translate camera coordinates into image coordinates?
After step 1 we have camera coordinates, we turn these into image coordinates by performing matrix multiplication between two matrices. 1 matrix which is the camera coordinates matrix and the second matrix which is formed of the focal distance:
[ f, 0, 0, 0]
[ 0, f, 0, 0]
[ 0, 0,1, 0]
What steps are performed to translate image coordinates into pixel coordinates?
After step 2 we have image coordinates, we turn these into pixel coordinates by performing matrix multiplication which represents translating the image plane origin coordinate into the pixel origin coordinate:
You take the image coordinates [x, y, 1] and multiply them with this matrix:
[ 1/dx, 0, u0]
[ 0, 1/dx, v0]
[ 0, 0, 0]
What is the difference between image coordinates and pixel coordinates?
Image coordinates are in mm, pixel coordinates have their own scale and also need to be integers.
Where is the pixel origin/origin of an image?
In the top left corner