Machine Learning Flashcards

Question

What is the output n(x) of a neuron n?

Answer 1

n(x) = A(w₁x₁ + ... + wₙxₙ+ b)

Answer 2

A perceptron is a linear classifier that is based on a single neuron with a digital threshold function. The perceptron criterion punishes only incorrectly classified samples. Invented by Frank Rosenblatt in 1958.

Answer 3

If a solution exists (i.e. the data set is linearly separable), then the perceptron learning algorithm finds a solution within a finite number of steps.

Answer 4

A regressor h that predicts yₙ for a given xₙ with minimum loss L.

Answer 5

The interpolation function f must be consistent with the samples S: f(xᵢ)=yᵢ In contrast, in regression the hypothesis h should minimize L and generalize well to new samples.

Answer 6

For a given dataset, we want to optimize the weights so that they maximize the likelihood of the observations (target values yₙ) for the input xₙ.

Answer 7

𝐾 classes are represented using a 1-of-𝐾 coding scheme where | 𝐶ₗ = (0,0, … , 1, … , 0) ∈ {0,1}ᴷ

Answer 8

y_cls(xₙ, w) = f(y_reg(xₙ, w)) A generalized linear model adds an nonlinear activation function f to a linear regression task. Subspaces y_cls(xₙ, w) = c for a constant c are called decision surfaces.

Answer 9

A linear discriminant function divides the feature space by a hyperplane decision surface.

Answer 10

f(x)=0 if n<0, 1 if n≥0 The resulting decision surface is wᵀxₙ = 0

Answer 11

We separate K classes with K-1 binary discriminant functions. Every discriminant function separates one class from all others. Some regions may be classified ambiguously!

Answer 12

We separate K classes pairwise with K(K-1)/2 binary discriminant functions. The class of the data sample is assigned by a majority vote. Some regions may be classified ambiguously!

Answer 13

The input space regions defined by the discriminant functions may overlap or do not cover the whole input space. The solution is to determine a partition of the input space. Every class is assigned a linear function. A sample is assigned to a class if the function returns a larger value than all other classes.

Answer 14

Ordered sets implicitly impose a non-existing order on the classes that might be captured by the learned model. Use one-hot encoding (0,0,...,1,0,..)

Answer 15

Least squares classification is sensitive to outliers. Also, if the input is linearly separable, the algorithm computes one of infinitely many solutions.

Answer 16

SVMs minimize the generalization error by computing a hyperplane that maximizes the margin of the classifier. Only support vectors define the decision boundary of the SVM and must be saved. In general, the optimization function includes slack variables to allow for training samples that lie inside the margin (-> better generalization performance)

Answer 17

ADALINE was an early (1960) single-layer artificial neural network with multiple nodes, where each node accepts multiple inputs and generates one output.

Answer 18

The AI winter was period of reduced funding and interest in artificial intelligence research. It began in 1969 with the proof that the perceptron only works for simple data sets that are linearly separable. The AI winter ended with the development of new neural network architectures and the backpropagation algorithm.

Answer 19

A neural network with only a single hidden layer can approximate any function arbitrarily well.

Answer 20

Backpropagation optimizes the loss by varying the weights based on the chain rule.

Answer 21

The scale-invariant feature transform (SIFT) is a feature detection algorithm in computer vision to detect and describe local features in images. It was published by David Lowe in 1999. The SIFT features are designed to be invariant to scale, translation and partially invariant to illumination changes and projections. Specific neurons in the inferior temporal cortex share similarities with the SIFT features.

Answer 22

A deep neural network (DNN) is an artificial neural network (ANN) with multiple (hidden) layers between the input and output layers.

Answer 23

Deep neural networks can learn suitable feature space representations automatically. Layers in deep neural networks correspond to features at different levels of abstraction.

Answer 24

Convolutions are filters on discrete (=sampled) signals (images, audio, sensor data etc). Discrete convolutions can be achieved with kernels. The synaptic connectivity pattern of convolutional neural networks implements a neural version of the discrete convolution.

Answer 25

Matrices that define a signal filter. The entries of the kernel determine the influence of neighboring data points (e.g. pixels) of a sample. The filtered data sample is computed by sliding the kernel along the input sample. Kernels encode position-invariant features.

Answer 26

Gabor filters can detect image content with a specific frequency and direction. Deep neural networks typically learn features similar to Gabor filters in the first layer.

Answer 27

A CNN for digit recognition 32 x 32 pixel input 60,000 trainable parameters

Answer 28

First modern deep CNN. - accelerated training on GPUs - outperformed all competitors in 2012

Answer 29

Weight sharing is a way to reduce the number of parameters (e.g. weights of a filter) while allowing for more robust feature detection.

Answer 30

Max-pooling is a sample-based discretization process. The objective is to down-sample an input image, reducing its dimensionality and allowing for assumptions to be made about features contained in the sub-regions binned. - reduces number of weights - makes network more robust against translations in the input image

Answer 31

Convolutions are computationally expensive. When taking the Fourier transform (converting inputs and kernels into Fourier space), convolutions become simple multiplications: ℱ(𝑓∗𝑠) = ℱ(𝑓) ∙ ℱ(𝑠) The challenge lies in keeping the computational cost for the FT low.

Answer 32

- Single biological neurons can compute complex functions that would require several artificial neurons - The brain processes different streams of information in parallel and in different brain regions. Different data streams a fused into a coherent perception. - Information processing in the brain is more robust and versatile (data fusion, one-shot learning, memory etc.). - Deep neural networks suffer from catastrophic forgetting. Performance on a learned task decreases drastically as soon as a new task is trained. example: - adversarial patches (techniques to fool ML models)

Answer 33

In fully connected neural networks, all neurons process the complete input image, it is therefore not possible to separate the input into different aspects. A Capsule Neural Network is a ANN that can be used to better model hierarchical relationships. The approach is an attempt to more closely mimic biological neural organization. New type of neuron: the capsule. Encodes not only a single feature but also adds meta information on its instantiation. The output of a capsule is a vector that indicates whether the visual feature represented by the capsule is present and how it differs from a prototype instance (pose, transformation, lighting etc.) Lower-level capsules connect to higher-level capsules with a state that is in agreement with the current output. ➔ difficult to train!

Answer 34

Energy efficiency

Answer 35

- supervised learning - unsupervised learning - reinforcement learning

Machine Learning Flashcards

- What is (machine) learning? - What different types of datasets do exist? - Which learning paradigms do result from these types? - How does the Perceptron learning rule work? - What is a common method for supervised learning? (59 cards)