Exam prep Flashcards

1
Q

Supervised learning

A

A subcategory of machine learning which uses inputs, a desired output and labels to train a model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Semi-supervised learning

A

Using labelled as well as unlabeled data to perform certain learning tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Dimensionality reduction

A

Model h in H that represents each instance x with a lower dimension feature vector whilst preserving key features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Unsupervised learning

A

A subcategory of machine learning which uses inputs and a desired output to train a model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Clustering Analysis

A

A machine learning technique that involves grouping sets of objects in such a way that objects in the same group have similar features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Anomaly Detection

A

The identification of rare items, events or observations which deviate significantly from the majority of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Reinforcement learning

A

A subset of machine learning that allows an agent to learn through trial and error using feedback from its actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

P(A | B)

A

P(A n B)/P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Max

A

Maximum value of a function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Argmax

A

Sequence of x values to get maximum y values.
For example, for max sin(), values would include 0.5, 1.5 etc, which maximise the y value, but would not include things like 1 or 2, which do not maximise sin().

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

MAP(A)

A

argmax_a P(A)
Highest probability in A.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

MAP(A, B)

A

argmax_a, b P(A,B) = argmax_a, b P(B|A)P(A)
Highest probability in P(A, B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Validation dataset

A

Part of training dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Regularisation

A

Techniques that are used to calibrate machine learning models in order to minimize the adjusted loss function and prevent overfitting or underfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Test dataset

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Training dataset

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Error rate for confusion matrix

A

1 - accuracy
or
FP + FN / TP + FP + FN + TN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Accuracy for confusion matrix

A

TP + TN / TP + FP + FN + TN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Loss function

A

1/m x sum(x - x_)^2

where
x = observed values
x_ = predicted values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Linear regression

A

Used to minimise the loss function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Naive Bayes

A

P(X … Xn | Y) = ΠP(Xi | Y)
Essentially, Xi and Xj are conditionally independent given Y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Rosenblatt’s Perceptron

A
  • initalise weights randomly
  • take one sample x and predict y
  • for erroneous predictions update weights
  • if output = 0 and y’s true value 1, increase weights
  • if output = 1 and y’s true value 0, decrease weights
  • repeat until no errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Perceptron

A

Machines for binary classifications, with one weight per input.
We multiply the weights with their respective inputs and add bias.
If the result is larger than threshold return 1, else 0.
XOR requires multiple perceptrons.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Convolution filter

A

Apply filter to input via every stride, and calculate results by multiplying matching cells and adding together for that space the filter is on.

25
Q

Output size for convolution filter

A

(N-F) / stride-1
where
N = length or width of input
F = length or width of filter

26
Q

Stride

A

Amount the filter jumps from each input.

27
Q

Manifolds

A

A manifold is a mathematical object that can be curved but looks flat locally.

28
Q

V

A

Or in decision trees.

29
Q

Ʌ

A

And in decision trees.

30
Q

Regression for nearest neighbour

A

Summation of classifiers of k nearest neighbours / k

31
Q

Pooling

A

Reduces the matrix by only choosing important features.

32
Q

Max-pooling

A

An extension of pooling which we pick the maximum of a selected group of numbers to represent the bigger matrix as a smaller one.

33
Q

Indirect Casual Effect

A

For X -> Y -> Z
Iif Y is observed, X and Z are independent and cannot affect each other.

34
Q

Indirect Evidential Effect

A

For Z -> Y -> X
Iif Y is not observed, then X can influence Z.

35
Q

Common Cause

A

X <- Y -> Z
If y is not observed, then X and Z can influence each other.

36
Q

Common Effect

A

X -> Y <- Z
If Y is not observed, X and Z cannot influence each other.

37
Q

Causal Reasoning

A

Cause to effect, where we have a certain cause which predicts an effect.

38
Q

Evidential Reasoning

A

Effect to cause.

39
Q

Intercausal reasoning

A

Based upon influential chains, aka two nodes which are not directly connected.
An example is explaining away.

40
Q

Explaining away

A

The probability of a cause given its effect is lower when an alternative cause is present and higher when the alternative is absent.

41
Q

Zero-padding

A

Surrounding the matrix with cells containing 0 to prevent shrinkage of data.

42
Q

ReLU Layer

A

Determines numbers below 0 and converts them into 0.

43
Q

Residual Block

A

Adds original input to the output.

44
Q

Dropout

A

Removal of certain nodes.

45
Q

BatchNorm Layer

A

Uses batch normalisation to make trainings of neural networks faster and more stable.

46
Q

Softmax Layer

A

Adds decimal probabilities to each class in a multi-class problem.

47
Q

Preprocessing Data

A

Ensures that data is in a format that the network can accept, which includes re-sizing and re-scaling into an appropriate scale.

48
Q

Accuracy

A

|{M(X) = lx | x exists in D}| / |D|
where
M(x) - predicted label
lx - true label

49
Q

Empirical Loss

A

Measures the loss on a smaller scale.

50
Q

Generalisation Error

A

The measure of how accurately an algorithm can predict outcomes based on unseen data.

51
Q

For neural network gradients

A

1) Substitute data in (where we have y1, substitute what y1 actually is)
2) Simplify the expression down
3) Find the derivative (x1, x2, x3 etc) and the number associated with the question, and find if the derivative matches the coefficient in the equation.

52
Q

For InfoGains

A

1)
Workout the original entropy of y.
2)
Workout the conditional entropy, calculating all possibilities and centralising X.
3)
subtract the conditional entropy from the original entropy.

53
Q

For k-trees (regression)

A

1)
Figure out the metric that is wanted and workout distances from that metric.
2)
Figure out the k closest neighbours from the coordinates you want to plant.
3)
Figure out the classification of those neighbours and perform the following equation:
Summation of classifiers of k nearest neighbours / k

54
Q

Forward Computations

A
  • collect annotated data.
  • define a model and initalise randomly
  • predict based on current model (forward propagation)
  • evaluate predictions
55
Q

Backward Computations

A
  • compute loss
  • compute loss gradient over weights
  • update weights
  • completion of a round of forward-backward computation.
56
Q

Cross validation v random resampling

A

Random resampling randomizes training set and test set after every iteration, potentially leading to similar test and training data. Cross validation prevents this by selectively choosing test data every iteration, specifically by the index.

57
Q

I-map

A

Represents independencies with graphs.
Depending if two nodes are dependent or not, and what the table of data represents, the graph can be an I-map of the table.
So if the graph showed independence between two nodes and the table shows this too, then we can class the graph as an I-map of the table

58
Q

I-map exceptions

A

Following on, this implies that if a graph shows dependence but the table is independent, we can SOMETIMES assume that the graph is not an I-map of the table.
However, if the graph represents a non-set, which belongs in every mathematical set, then it still can be classed as an I-map.