Week 1: Introductions – Organisation & ML Basics Flashcards

Question 1

Q

Describe unsupervised, semi-supervised and supervised

Answer

A

Unsupervised:
- Unlabelled data
- Understand relationships between the features
- Find correlations

Semi-supervised:
- a combination of labeled and unlabelled data.

Supervised:
- Classify or regression
- Optimise cost

Question 2

Q

Define what is the goal when doing classification

Answer

A

We want to learn ad decision boundary.
A function f(X, theta) = if some condition i met, then predict class A, if not B

Where the loss is defined as:
L(θ) is the average of all the individual losses for each training point. Each individual loss compares the true target with the class our model predicted (what our decision boundary f(x;θ) said) for each data point.

L(θ) = 1/N * sum of l(y_true_n, f(data point_n, model parameters)

Question 3

Q

What is regression?

Answer

A

Predict output for input

Question 4

Q

Translation vs Transcription - what’s the difference?

Answer

A

Transcription: Convert unstructured input to text (audio → English text)

Translation: Convert one language to another (English text → Spanish text)

Question 5

Q

What is anomaly detection

Answer

A

To detect whether something is in the data is unusual. This could be for traffic checking.

Question 6

Q

What is “Structuring/Compression” in ML?

Answer

A

Re-organize data with respect to relationships between elements, PCA is an example

Question 7

Q

Name some reasons why we do data exploration

Answer

A

Central tendencies
Basic measures of shape & dispersion
Structure/patterns in the input data
Achieve human understanding!
Capture all your data well
Complete labelling?
Check for missing values
Clean data if sensible

Question 8

Q

What is “identical representation” in ML data preprocessing?

Answer

A

All data samples must have the exact same structure/dimensions so they can be:

Stacked into tensors (mathematical arrays)

Processed in parallel as batches
Fed through the same model architecture

Example: All images must be 32×32×3, all text sequences must be 500 tokens, etc.

Question 9

Q

What does “Cut or pad your data” mean in preprocessing?

Answer

A

Resize/crop larger data to standard size

Pad: Add zeros/empty space to smaller data

Goal: All inputs have identical dimensions for tensor operations

Question 10

Q

What are the key supervised learning evaluation metrics and their formulas?

Answer

A

Confusion Matrix: TP, TN, FP, FN
Accuracy = (TP + TN)/(P + N) - Overall correctness
Precision (PPV) = TP/(TP + FP) - “Of predicted positives, how many were right?”
Recall (TPR) = TP/(TP + FN) - “Of actual positives, how many did we catch?”
F1-Score = 2·(Precision·Recall)/(Precision + Recall) - Harmonic mean
FP-rate (FPR) = FP/N - False alarm rate
ROC/AUC - Trade-off curve between TPR vs FPR

Question 11

Q

Please identify key
aspects of ML

Answer

A

Define task, represent your data, select metrics, develop your model

Week 1: Introductions – Organisation & ML Basics Flashcards

(11 cards)