Nearest Neighbour Flashcards

1
Q

What is Voronoi tesselation?

A

Partitions space into regions where all points are closer to the regions training point than any other point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a Voronoi cell?

A

A single region in a Voronoi tessalation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the decision boundary for 1 nearest neighbor?

A

The edges of all pairs of Voronoi cell that contain training examples from diffrent classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

kNN classification algorithm

A
  1. Compute distance D(x, xi) to every training example
  2. Select k closest instances
  3. Output the most frequent class
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

kNN regression algorithm

A
  1. Compute distance D(x, xi) to every training example
  2. Select k closest instances
  3. Output the mean of closest instances
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is interpolation?

A

Prediction within range of the training examples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is extrapolation?

A

Prediction outside the range of training examples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What happens when you pick small values for k in kNN?

A

Small changes in training set produce large changes in classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What happens when you pick very large values for k in kNN?

A

Everything is classified as the most probable class P(y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we choose the right value for k in kNN?

A

Train kNN for k = 1, 2, 3, … then pick the model with the smallest validation error on the validation set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Whats the definition of D(x, x’) if D is the euclidian distance function and x has d dimensions?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Whats the definition of D(x, x’) if D is the hamming distance function and x has d dimensions?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Whats the definition of D(x, x’) if D is the minkowski (p-norm) distance function and x has d dimensions?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is distance function obtained when p=2 for p-norm?

A

Euclidian

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is distance function obtained when p=1 for p-norm?

A

Manhatten

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does p-norm behave when p approaches 0?

A

logical and

17
Q

How does p-norm behave when p approaches infinitity?

A

logical or

18
Q

How can you resolve ties in kNN? (4)

A
  • randomly
  • prior (class with greatest probability)
  • nearest: use 1-nn
  • use odd k (doesnt solve multiclass)
19
Q

What is a reasonable choice to fill in missing values (kNN)?

A

Mean of the value across entire data set

20
Q

How does Parzen Windows differ from kNN?

A

Instead of picking k nearest neighbors, parzen windows looks at all the training examples that are in a fixed radius from the point

21
Q

How can kNN be modified to use a kernal?

A

Replace the distance function with a kernal K(x’, x)

22
Q

Whats the cons of kNN? (4)

A
  • Need to handle missing data (fill in or special distance function)
  • Sensitive to class outliers
  • Sensitive to irrelevant attributes (affects distance)
  • Computationaly expensive
23
Q

What is the runtime complexity of kNN? (testing)

A

O(nd)

n - training examples

d - dimensions

24
Q

What can we reduce d in kNN?

A

Dimensionality reduction

25
Q

What datasets are K-D trees effective (kNN)?

A

low-dimensional, real-valued data

26
Q

What datasets are inverted lists effective (kNN)?

A

high-dimensional, discrete (sparse) data (e.g. text)

27
Q

What datasets are locality-sensitive hashing effective (kNN)?

A

high-dimensional, real-valued or discrete

28
Q

Which methods are inexact at finding neighbors in kNN?

  • K-D trees
  • inverted lists
  • locality-sensitive hashing
A

K-D trees and locality-sensitive hashing

29
Q

Which methods are exact at finding neighbors in kNN?

  • K-D trees
  • inverted lists
  • locality-sensitive hashing
A

inverted lists

30
Q

How are K-D trees built from training data?

A
  1. Pick a random dimension
  2. Find median
  3. Split data
  4. Repeat until the nodes have the desired amount of points
31
Q

How is a dataset split with locality-sensitive hashing?

A

Create random hyper-planes h1, …, hk that split the dataset into 2k regions