Topic 1: Introductions – Organisation & ML Basics Flashcards
(15 cards)
What is AI, ML and Deep Learning
AI - A computer that can mimic human behaviour
Machine learning - Learns to fit a model to data without explicit programming
Deep learning - Learns features in data using neural networks
Why is a probabilistic approach useful in machine learning?
From a probabilistic perspective, machine learning with real world data is uncertain with:
- variance
- ambiguity
- transformations
- partial information
We can have some unknown quantities that will be random variables.
If we have to make under some uncertainty, then the probabilistic approach is ideal, such as modelling different possible outcomes and their probabilities
What is supervised learning?
- Learning from labelled data, so we know what the ground truth is
- Classify or regress on data, so the goal is to learn a mapping from inputs to outputs, which can be done through classification or regression
- Optimise cost, here we train the model by optimising a cost function, which measures how wrong the predictions are compared to the true labels
What is semi-supervised learning?
- We’re looking for a partial target, we know what class an image belongs to, but we don’t why
- State-action cycle, so a system is in a state, and it takes an action and that action will change the state and lead to feedback
- Independent reward function, so such as the model will learn indirectly what is good, which is a result from the feedback
What is unsupervised learning?
- We’re dealing with unlabelled data, so we don’t have a ground truth
- We try to understand the relationships in the data, so which users behave similarly
- We want to correlate variables, so which variables relate to each other
What are common ML tasks?
Classification: When there are separate classes where we learn a decision boundary of a rule (separate the classes). There is also a loss L determined by a loss function, l
Regresson: When you prediction the output for the input (make a regression from the data)
Transcription: we can take unstructured input like audio and output it to discrete text (speech recognition)
Translation: input could be some symbols, and the output could be another language
Structuring/Compression: we can re-organise data with regards to the relationships between the elements. Learning how the data is structured to then compressing it without losing information (e.g. PCA and autoencoder)
Anomaly detection: it can flag elements that are unusual/atypical, so finding outliers
Synthesis and sampling: it can generate new and similar data elements (GAN and VAEs)
Denoising: it can predict clean elements from corrupted ones, so it can remove noise
Density estimation: it can learn the probability distribution that generated the data (KDE, VAEs)
What is EDA and why is it important?
We use it to understand out data:
- we can find central tendencies
- we can find basic measures of shape and dispersion
- we can structure the patterns in the input data
Capture all your data well:
- Can we complete the labelling?
- Check for missing values
- Clean the data if sensible
Data representation:
- Find a representation that is suitable for your task
- should it be greyscaled? RGB?
- Should it be characters? words? vectors?
Consider automated data transformations:
- E.g. normalisation
- Encoding in a different space
- Reduce or augment the information
How is data represented for ML models?
- As tensors that can be batched: multiple samples in one big matrix.
- They are efficient for GPU computation.
- They are the standard format for deep learning models.
What are some steps in data preprocessing?
We use is to encode our data:
Input and output dimensionality:
- Large amount of data can be processed as one tensor in parallel
- But it needs identical representation
- We can cut or pad our data
Data loading:
- Efficient storage: use Pandas & pickle, Hierarchical Data Format (HDF5)
- Efficient processing: Consider dataset packages of major ML frameworks
What are key metrics for supervised learning?
-
Accuracy: How many predictions were correct out of all predictions.
- If you guessed 8 animals correctly out of 10, your accuracy is 80%.
-
False-positive rate: How often the model says “yes” when it should say “no”.
- If the model thinks 2 dogs are cats (but they’re really dogs), that’s 2 false positives.
-
Precision: Of all the times the model said “Yes,” how often was it right?
- How many of my guesses were actually cats?
-
Recall: Of all the real cats, how many did the model find?
- How many real cats did I find?
-
F1-score: The balance between precision and recall. If both are high, F1 is high. If one is low, F1 drops.
- It’s useful when you care equally about being correct and finding all the right cases.
-
ROC: A graph that shows how well the model separates classes as you change the decision threshold.
- X-axis = False Positive Rate, Y-axis = True Positive Rate, The more the curve bends toward the top-left, the better.
-
AUC: The area under the ROC curve. Ranges from 0.5 (random guessing) to 1.0 (perfect).
- Tip: Higher AUC = better model at distinguishing classes.
What are key metrics for unsupervised learning?
For unsupervised learning (we need different measurements, as we have no labels, so we use various distance and similarity measures):
- Minkowski distances
- Intra/Inter-cluster distance
- log-likelihood: Measures how likely the data is under your model.
- Higher is better, meaning your model “explains” the data well.
- Perplexity
How does a basic neural network work?
It takes an input, x_1, x_2, x_3, and each input has a weight w_1, w_2, w_3, and it combines this into a perceptron by taking the sum of the input and the weights, then it can have an (nonlinear) activation function and we get the output.
y = f(X * W) = tanh(sum^n_k=1 x_k w_k)
It is a feature-learning algorithm, so it learns patterns or features from the raw data
What are low-, mid-, and high-level features in deep learning?
Instead of hand-crafting some features or picking which are important, neural networks can learn what features actually matter
low-level features = simple patterns such as lines and edges
mid-level features = parts of an object, such as eyes, nose or ears
high-level features = whole objects, such as faces
What are the basic steps of a machine learning project?
-
We define the task
- is it supervised (with labels)? semi-supervised (some labels)? unsupervised (no labels)?
- this step often needs a reasonable understanding of the data, so here EDA will be useful
-
represent your data
- we need an efficient transformation of our data into a numerical space, so taking raw text data e.g. and convert into tensors
- we need as few dimensions as possible, so do some dimensionality reduction
- here we need to do some preprocessing, e.g. normalisation or transforming
-
select your metrics
- the metrics depend on the task at hand, is it classification or regression?
- it also depends on the data, is it balanced, imbalanced?
-
develop your ML model
- We often need to train (validate) + evaluate on different metrics, to answer different questions. but always train and test on a whole separate dataset
Identify the key aspects of ML
How to:
Step 1: Define your task
Step 2: Represent your data
Step 3: Select your metrics
Step 4: Develop your ML model
How to develop the ML model:
Step 1: Define the architecture
Step 2: Define the activation and loss functions
Step 3: Select the optimiser
Step 4: Choose the training characteristics