L7: Bag of Words Flashcards

1
Q

What is Bag of Words (BoW)?

A

Image classification that represents an image as a set of features. Each feature consists of a keypoint and descriptor which are used to construct vocabularies through clustering to make visual words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

❗️❗️❗️What are the requirements for the feature extraction in BoW?

A
  • Sample a number of descriptors in each frame (from feature extraction (e.g. SIFT or ORB))
  • Must be invariant (light, rotation and scale, giving us more robust features)
  • Number of descriptors will influence the runtime and size of vocabulary
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a visual word in BoW?

A

This is the clustering done for the descriptors, where each cluster represents a visual word

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a vocabulary?

A

This is a collection of visual words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

❗️❗️❗️Training/clustering (1):
How to gather the correct data for learning a vocabulary (local features from a training set)

A

Sample of invariant 2D descriptors (SIFT, ORB) from each image in training data.
Clustered using k-means to form the visual words in the vocabulary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

❗️❗️❗️Training/clustering (2):
The k-means algorithm and the interpretation of the clusters

A

K-means cluster used to cluster features for the training data. The clusters will represent the “words” of our visual vocabulary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does K-means clustering work for the features?

A
  1. k initial means are randomly generated within n feature vectors x.
  2. K clusters are created by associating every data point x_i with the nearest mean with squared euclidean distance.
  3. The centroid of each of the k clusters become the new mean.
  4. steps 2 and 3 repeated until we hit convergence.

Size of k is important:
- Small: Underrepresented
- Big: Over determined
Use elbow plot to find the good k.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

❗️❗️❗️Training/clustering (3):
How to make the training set searchable by forming global image descriptors

A
  • Have k clusters in descriptor space
  • Quantize all descriptors of each image by taking the index of the cluster each descriptor belongs to.
  • Bin the index-quantized features to a histogram giving Global BoW image descriptor

Each image can be represented with a histogram (Global BoW descriptor)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

❗️❗️❗️Retrieval(1):
How BoW descriptors can be used for retrieval based on a novel image

A

Vocabulary with all visual words found with k-means from training set.
New image find nearest match in training set by comparing histogram of the image and the visual word.
- Repeat the whole process to get global BoW descriptor (feature extraction, clustering, index-quantizing, histogram, global BoW descriptor)

Use cosine distance (euclidean, chi-squared) for comparison, the smaller the distance the more similar the images are = best match.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

❗️❗️❗️Retrieval(2):
How to perform tf-idf weighting to obtain better matching results

A

Some words have no special meaning (“and” in text, sky in environment), ad weights to each bin of k-dim BoW descriptor.
Use tf-idf
- tf: Instead of using the raw bin count, normalize each component. (a_i/n_i) “Upvotes” words that are frequent in a single image.
- idf: Number of times a visual word N_i appears at least once in an image against the total number of images N (log N/w_i). Prioritizes word that are rate over the whole training set
(a_i/n_i)*log(N/N_i)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

❗️❗️❗️Retrieval(3):
How to reduce the search problem using an inverted file index

A

Storing the indices of the training set images where each visual word appears and during retrieval, match only against the images associated with non-zero words in the image descriptor.
- Menaing you only look at smaller set of images than the whole vocabulary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

❗️❗️❗️What can BoW be used for related to our study?

A

Used for loop-closure in a VSLAM algorithm. It provides a fast/good way of comparing (finding similarities in) a new image to images of our database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly