Week 1: Introductions – Organisation & ML Basics Flashcards

(11 cards)

1
Q

Describe unsupervised, semi-supervised and supervised

A

Unsupervised:
- Unlabelled data
- Understand relationships between the features
- Find correlations

Semi-supervised:
- a combination of labeled and unlabelled data.

Supervised:
- Classify or regression
- Optimise cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define what is the goal when doing classification

A

We want to learn ad decision boundary.
A function f(X, theta) = if some condition i met, then predict class A, if not B

Where the loss is defined as:
L(θ) is the average of all the individual losses for each training point. Each individual loss compares the true target with the class our model predicted (what our decision boundary f(x;θ) said) for each data point.

L(θ) = 1/N * sum of l(y_true_n, f(data point_n, model parameters)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is regression?

A

Predict output for input

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Translation vs Transcription - what’s the difference?

A

Transcription: Convert unstructured input to text (audio → English text)

Translation: Convert one language to another (English text → Spanish text)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is anomaly detection

A

To detect whether something is in the data is unusual. This could be for traffic checking.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is “Structuring/Compression” in ML?

A

Re-organize data with respect to relationships between elements, PCA is an example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Name some reasons why we do data exploration

A
  • Central tendencies
  • Basic measures of shape & dispersion
  • Structure/patterns in the input data
  • Achieve human understanding!
  • Capture all your data well
  • Complete labelling?
  • Check for missing values
  • Clean data if sensible
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is “identical representation” in ML data preprocessing?

A

All data samples must have the exact same structure/dimensions so they can be:

Stacked into tensors (mathematical arrays)

Processed in parallel as batches
Fed through the same model architecture

Example: All images must be 32×32×3, all text sequences must be 500 tokens, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does “Cut or pad your data” mean in preprocessing?

A

Resize/crop larger data to standard size

Pad: Add zeros/empty space to smaller data

Goal: All inputs have identical dimensions for tensor operations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the key supervised learning evaluation metrics and their formulas?

A

Confusion Matrix: TP, TN, FP, FN
Accuracy = (TP + TN)/(P + N) - Overall correctness
Precision (PPV) = TP/(TP + FP) - “Of predicted positives, how many were right?”
Recall (TPR) = TP/(TP + FN) - “Of actual positives, how many did we catch?”
F1-Score = 2·(Precision·Recall)/(Precision + Recall) - Harmonic mean
FP-rate (FPR) = FP/N - False alarm rate
ROC/AUC - Trade-off curve between TPR vs FPR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Please identify key
aspects of ML

A

Define task, represent your data, select metrics, develop your model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly