Quiz 4 Flashcards

1
Q
  1. Which of the following examples can be expressed in high-dimensional space and cast as a “finding similar items” problem?
    A. Pages with similar words
    B. Customers with similar purchase history
    C. Images with similar features
    D. All the above
A

D. All the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  1. What role does hashing play in the similarity search pipeline?
    A. To efficiently compute pairwise similarities
    B. To recommend a list of potential matches to a query document
    C. To reduce the dimensionality of feature vectors
    D. To efficiently identity/group near duplicate documents
A

D. To efficiently identity/group near duplicate documents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
  1. Which step in the similarity search process focuses on converting large sets to short signatures?
    A. Min-Hashing
    B. Shingling
    C. Jaccard Similarity Calculation
    D. Locality Sensitive Hashing
A

A. Min-Hashing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  1. Which of the following is not a step in finding similar documents?
    A. Min-Hashing
    B. Shingling
    C. Pairwise comparison
    D. Locality Sensitive Hashing
A

C. Pairwise comparison

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  1. Which of the following sequences cannot be a k-shingle for the string “3162 Introduction to Data Mining” for any k?
    A. 3162 Introduction
    B. Introduction to Data Mining
    C. Data Mining
    D. 3162 Mining
A

D. 3162 Mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. True | False Min-Hashing produces short signatures while preserving the similarity of the original document.
A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q
  1. True | False The most efficient algorithm for computing document similarity requires at least O(N2) space.
A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
  1. True | False The similarity of two signatures is the fraction of hash functions in which they agree.
A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
  1. True | False The probability that the hash values of two columns, C1 and C2, are equal under a random permutation p is equal to their Jaccard Similarity.
A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
  1. True | False Locality-sensitive hashing is primarily used to find exact matches in a large dataset.
A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
  1. True | False Documents that are potentially similar will have many shingles in common.
A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly