Week 6: Instance-Based Learning Flashcards
Instance-based Classification
To classify a new instance, search the memory for the training instance that most resembles the new instance. A distance metric is used to compare the training instances.
Main components:
- Distance Function, which calculates the distance between any two instances.
- Combination function, which combines results from neighbours to arrive at an answer
Pros:
- Not restricted by data format
- Efficiently adaptable to the additional of new instances
- Requires relatively low training effort
Cons:
- Memory intensive
- Can be more time consuming that neural networks or decision trees
- Making suitable distance and combination functions can be tough
Similarity/Distance
This is the means by which different training instances can be measured for suitability with the new instance.
Absolute Difference
\left\lvert x_{i,j} - x_{i^{‘}, j} \right\rvert
Squared Difference
(x_{i,j} - x_{i^{‘},j})^2
Normalised Absolute Value
\frac{\left\lvert x_{i,j} - x_{i^{‘},j} \right\rvert}{\max_j - \min_j}
Absolute Difference of Standardised Values
\frac{\left\lvert x_{i,j} - \mu_j \right\rvert}{\sigma_j} - \frac{\left\lvert x_{i^{‘},j} - \mu_j \right\rvert}{\sigma_j}
Distance Metric for Nominal Attributes
d(x_{i,j}, x_{i^{‘},j}) = \left{ \begin{array}{cc} 0, & if x_{i,j} = v and x_{i^{‘},j} = v \ 1 & if x_{i,j} = v and x_{i^{‘},j} \ne v \end{array}
Euclidean Distance
d(\boldsymbol{x}i, \boldsymbol{x}{i^{‘}}) = \sqrt{\sum_{j=1}^n(x_{i,j} - x_{i^{‘},j})^2}
Manhattan Distance
d(\boldsymbol{x}i, \boldsymbol{x}{i^{‘}}) = \sum_{j=1}^n \left\lvert x_{i,j} - x_{i^{‘},j} \right\rvert
Weighted Voting
In this setup, not all neighbours are treated equally. The weight of the vote is proportional to the similarity with the new instance. Weighted voting allows for preventing ties.
Spectrogram
Can represent audio data in the frequency domain. Use for identifying songs from audio samples.
In the case of music identification, the vertical axis correspond to music frequencies. The horizontal axis represents time. The data can be transformed into constellation plots.
Constellation Plot
A graph of the frequency peaks for a song in the frequency domain. It only has the spectrogram peaks. Each song is defined by a unique constellation of peaks in the frequency domain, with background noise minimised.
Metric for Finding Similar Songs
Number of peaks that the two songs have in common / Number of all distinct Peaks
Anchor Points
An anchor point is a peak in the constellation chart that is paired with other peaks that follow in the time axis within a specified range of frequencies.
In the context of song identification, each anchor-peak pair is associated with a time difference between the peak and the anchor pair, and the frequency difference between the peak and the anchor.
Song Identification Method
- Convert the new snippet into a constellation plot.
- Turn the constellation plot into anchor points with anchor-peak pairs.
- Identify matching anchor points between the listened song and songs in the database. Determine the longest consecutive sequence of overlap between the two.
- Return the song with the longest overlap, i.e. the nearest neighbour.
kD-tree
An efficient data structure for storing instances in the k-dimensional space, where k is the number of attributes. It partitions the feature space into different regions, similar to Random Forests.
Finding Nearest Neighbours in kD-trees
Follow the appropriate path in the tree from the root to the relevant leaf node to find the nearest neighbour for a new instance. It’s not always correct, but is a good approximation.
Model
An abstract representation of a real-world process or artefact.
Predictive Modelling
Making a model that predicts future data. Examples: classification, regression
Descriptive Modelling
Making a model of known data. Examples: clustering, density estimation
Prescriptive Modelling
Making a model using data for making optimal decisions. Examples: mathematical programming
Parametric Models
Assumes a functional form. Uses the training data to estimate a fixed set of parameters for a model summarising the data. The number of parameters is independent of the number of training examples.
Examples: linear regression, logistic regression, neural networks
Non-Parametric Models
Data-driven approach. Has very few assumptions about the functional form. Don’t have a bounded set of parameters.
Examples: histograms, nearest neighbours, kernel methods
Histograms
Counts the number of points that fall in a certain bin. The bin heights estimate density. Is non-parametric.