Kernel Methods Flashcards
(11 cards)
What is the essential difference when using kernel methods?
Instead of computing the transformation explicitly, kernels allow computations in this space implicitly via the kernel trick
What are the key characteristics of kernel methods?
It is a memory based method (the training data is used during predictions
Fast to train, but slow to predict since all training points must be referenced
What is the problem kernel methods try to solve? Why? How?
The kernel methods try to avoid the explicit calculation of the features mapping. Beacause this operation could be computationally expansive.
It achieves it by using a kernel function to estimate the inner product in the transformed space
What is a kernel function? What are the properties a kernel must satisfy?
A kernel function acts as a similarity measure between two vectors
Properties:
- a kernel must be symmetric
- must also be positive semi-definite
How does the kernel methods work essentially?
In kernelized linnear regression instead of directly computing predictions using the weights transposed times inputs, the prediction is expressed in terms of kernel evaluations (a weighted sum over kernel distances)
What is the Gram Matrix
The gram matrix is a matrix containing the kernel distances between all train points
How could we construct kernels?
Either by feature space mapping (directly computing the inner distance) or by direct kernel construction (summing two kernels, multiplying two kernels, applying transformations to a kernel.
What does it means to validate a kernel function?
To be valid a kernel needs:
- symmetry
- Positive semi-definiteness of the Gram matrix (for any vector c of the Nth dimension: c transposed @ K @ c > 0
How can we practically verify whether a kernel is valid or not
- show that it can be written as an inner product in some feature space
- for a given dataset form the gram matrix and verify if all its eigenvalues are non negative
- using known results: any non-negative combination, product, or function of valid kernels is also a valid kernel
How can we do a KNN-regression using kernel methods? What is the essential idea behind it? Discuss the practical implementation behind it
This approach uses a kernel function to perform a weighted average of the training outputs, where the weights are determined by the similarity between the new input and each training point
Do an overview on Gaussian Processes for Kernel Methods