Week 6: Instance-Based Learning Flashcards

1
Q

Instance-based Classification

A

To classify a new instance, search the memory for the training instance that most resembles the new instance. A distance metric is used to compare the training instances.

Main components:
- Distance Function, which calculates the distance between any two instances.
- Combination function, which combines results from neighbours to arrive at an answer

Pros:
- Not restricted by data format
- Efficiently adaptable to the additional of new instances
- Requires relatively low training effort

Cons:
- Memory intensive
- Can be more time consuming that neural networks or decision trees
- Making suitable distance and combination functions can be tough

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Similarity/Distance

A

This is the means by which different training instances can be measured for suitability with the new instance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Absolute Difference

A

\left\lvert x_{i,j} - x_{i^{‘}, j} \right\rvert

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Squared Difference

A

(x_{i,j} - x_{i^{‘},j})^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Normalised Absolute Value

A

\frac{\left\lvert x_{i,j} - x_{i^{‘},j} \right\rvert}{\max_j - \min_j}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Absolute Difference of Standardised Values

A

\frac{\left\lvert x_{i,j} - \mu_j \right\rvert}{\sigma_j} - \frac{\left\lvert x_{i^{‘},j} - \mu_j \right\rvert}{\sigma_j}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Distance Metric for Nominal Attributes

A

d(x_{i,j}, x_{i^{‘},j}) = \left{ \begin{array}{cc} 0, & if x_{i,j} = v and x_{i^{‘},j} = v \ 1 & if x_{i,j} = v and x_{i^{‘},j} \ne v \end{array}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Euclidean Distance

A

d(\boldsymbol{x}i, \boldsymbol{x}{i^{‘}}) = \sqrt{\sum_{j=1}^n(x_{i,j} - x_{i^{‘},j})^2}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Manhattan Distance

A

d(\boldsymbol{x}i, \boldsymbol{x}{i^{‘}}) = \sum_{j=1}^n \left\lvert x_{i,j} - x_{i^{‘},j} \right\rvert

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Weighted Voting

A

In this setup, not all neighbours are treated equally. The weight of the vote is proportional to the similarity with the new instance. Weighted voting allows for preventing ties.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Spectrogram

A

Can represent audio data in the frequency domain. Use for identifying songs from audio samples.

In the case of music identification, the vertical axis correspond to music frequencies. The horizontal axis represents time. The data can be transformed into constellation plots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Constellation Plot

A

A graph of the frequency peaks for a song in the frequency domain. It only has the spectrogram peaks. Each song is defined by a unique constellation of peaks in the frequency domain, with background noise minimised.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Metric for Finding Similar Songs

A

Number of peaks that the two songs have in common / Number of all distinct Peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Anchor Points

A

An anchor point is a peak in the constellation chart that is paired with other peaks that follow in the time axis within a specified range of frequencies.

In the context of song identification, each anchor-peak pair is associated with a time difference between the peak and the anchor pair, and the frequency difference between the peak and the anchor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Song Identification Method

A
  1. Convert the new snippet into a constellation plot.
  2. Turn the constellation plot into anchor points with anchor-peak pairs.
  3. Identify matching anchor points between the listened song and songs in the database. Determine the longest consecutive sequence of overlap between the two.
  4. Return the song with the longest overlap, i.e. the nearest neighbour.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

kD-tree

A

An efficient data structure for storing instances in the k-dimensional space, where k is the number of attributes. It partitions the feature space into different regions, similar to Random Forests.

17
Q

Finding Nearest Neighbours in kD-trees

A

Follow the appropriate path in the tree from the root to the relevant leaf node to find the nearest neighbour for a new instance. It’s not always correct, but is a good approximation.

18
Q

Model

A

An abstract representation of a real-world process or artefact.

19
Q

Predictive Modelling

A

Making a model that predicts future data. Examples: classification, regression

20
Q

Descriptive Modelling

A

Making a model of known data. Examples: clustering, density estimation

21
Q

Prescriptive Modelling

A

Making a model using data for making optimal decisions. Examples: mathematical programming

22
Q

Parametric Models

A

Assumes a functional form. Uses the training data to estimate a fixed set of parameters for a model summarising the data. The number of parameters is independent of the number of training examples.

Examples: linear regression, logistic regression, neural networks

23
Q

Non-Parametric Models

A

Data-driven approach. Has very few assumptions about the functional form. Don’t have a bounded set of parameters.

Examples: histograms, nearest neighbours, kernel methods

24
Q

Histograms

A

Counts the number of points that fall in a certain bin. The bin heights estimate density. Is non-parametric.

25
Q

Kernel Density Estimate

A

Estimate the density function by taking a local weighted average of measurements around the point x of interest.

\hat{f}(\boldsymbol{x}) = \frac{1}{m} \sum_{i=1}^m K \left( \frac{\boldsymbol{x} - \boldsymbol{x}_i}{h} \right)

The choice of h determines the smoothness of the density function estimate. Large h values lead to oversmoothing, while small values lead to spiky estimates without much smoothing.

In practice, kernel models work only for low-dimensional data due to being memory-based. It’s also similar to nearest neighbour methods.

Note that the choice of the kernel function is more important than the choice of the bandwidth.

26
Q

Regression

A

Given a set of inputs in format of (x,y), learn y = \hat{f}(x)

27
Q

Connect-the-dots Regression

A

Compute a piecewise linear function connecting each data point x with two points on the left and right, respectively. The function is spiky and doesn’t generalise well, but is used in visualisations.

28
Q

k-Nearest Neighbours Regression

A

For each data point x, compute the mean of the data points in N_k(x), i.e. the k nearest neighbours of x. It’s somewhat smooth, but still has discontinuities.

29
Q

k-Nearest Neighbours Linear Regression

A

For a new data point x, compute a linear regression line through the set N_k(x) of k-nearest neighbours.

30
Q

Locally Weighted Linear Regression

A

For a new datapoint x, instances close to x are weighted heavily, while data points far away are weighted less heavily. A kernel is used to compute the weights.

31
Q

Support Vector Machines

A

Categorises data points by using a kernel to construct a linear separator by building maximum margin separators. The linear separators act as a binary classifier. Support vectors are the points closest to the separator.